biom-format Table objects¶
The biom-format project provides rich Table objects to support use of the BIOM file format. The objects encapsulate matrix data (such as OTU counts) and abstract the interaction away from the programmer. This provides the immediate benefit of the programmer not having to worry about what the underlying data object is, and in turn allows for different data representations to be supported. Currently, biom-format supports a dense object built off of numpy.array (NumPy) and a sparse object built off of Python dictionaries.
biom-format table_factory method¶
Generally, construction of a Table subclass will be through the table_factory method. This method facilitates any necessary data conversions and supports a wide variety of input data types.
- biom.table.table_factory(data, sample_ids, observation_ids, sample_metadata=None, observation_metadata=None, table_id=None, constructor=<class 'biom.table.SparseOTUTable'>, **kwargs)¶
Construct a table
Attempts to make ‘data’ sane with respect to the constructor type through various means of juggling. Data can be:
- numpy.array
- list of numpy.array vectors
- SparseDict representation
- dict representation
- list of SparseDict representation vectors
- list of lists of sparse values [[row, col, value], ...]
- list of lists of dense values [[value, value, ...], ...]
Description of available Table objects¶
There are multiple objects available but some of them are unofficial abstract base classes (does not use the abc module for historical reasons). In practice, the objects used should be the derived Tables such as SparseOTUTable or DenseGeneTable.
Abstract base classes¶
Abstract base classes establish standard interfaces for subclassed types and provide common functionality for derived types.
Table¶
Table is a container object and an abstract base class that provides a common and required API for subclassed objects. Through the use of private interfaces, it is possible to create public methods that operate on the underlying datatype without having to implement each method in each subclass. For instance, Table.iterSamplesData will return a generator that always yields numpy.array vectors for each sample regardless of how the table data is actually stored. This functionality results from derived classes implementing private interfaces, such as Table._conv_to_np.
- class biom.table.Table(Data, SampleIds, ObservationIds, SampleMetadata=None, ObservationMetadata=None, TableId=None, **kwargs)¶
Abstract base class for a what a Table is
- addObservationMetadata(md)¶
Take a dict of metadata and add it to an observation.
{observation_id:{dict_of_metadata}}
- addSampleMetadata(md)¶
Take a dict of metadata and add it to a sample.
{sample_id:{dict_of_metadata}}
- binObservationsByMetadata(f)¶
Yields tables by metadata
f is given the sample metadata by row and must return what “bin” the sample is part of.
- binSamplesByMetadata(f)¶
Yields tables by metadata
f is given the sample metadata by row and must return what “bin” the sample is part of.
- collapseObservationsByMetadata(metadata_f, reduce_f=<built-in function add>, norm=True, min_group_size=2)¶
Collapse a table by observation metadata
Bin observations by metadata then collapse each bin into a single observation. Metadata for the collapsed observations are retained and can be referred to by the ObservationId from each observation within the bin.
- collapseSamplesByMetadata(metadata_f, reduce_f=<built-in function add>, norm=True, min_group_size=2)¶
Collapse a table by sample metadata
Bin samples by metadata then collapse each bin into a single sample. Metadata for the collapsed samples are retained and can be referred to by the SampleId from each sample within the bin.
- copy()¶
Returns a copy of the Table
- delimitedSelf(delim='t', header_key=None, header_value=None, metadata_formatter=<type 'str'>)¶
Stringify self in a delimited form
Default str output for the Table is just row/col ids and table data without any metadata
If header_key is not None, try to pull out that key from observation metadata. If header_value is not None, use the header_value in the output.
- metadata_formatter: a function which takes a metadata entry and
- returns a formatted version that should be written to file
- filterObservations(f, invert=False)¶
Filter observations in self based on f
f must accept three variables, the observation values, observation ids and observation metadata. The function must only return true or false.
- filterSamples(f, invert=False)¶
Filter samples in self based on f
f must accept three variables, the sample values, sample IDs and sample metadata. The function must only return true or false.
- getBiomFormatJsonString(generated_by)¶
Returns a JSON string representing the table in Biom format.
- getBiomFormatObject(generated_by)¶
Returns a dictionary representing the table in Biom format.
This dictionary can then be easily converted into a JSON string for serialization.
generated_by - a string describing the software used to build the table
TODO: This method may be very inefficient in terms of memory usage, so it needs to be tested with several large tables to determine if optimizations are necessary or not (i.e. subclassing JSONEncoder, using generators, etc...).
- getBiomFormatPrettyPrint(generated_by)¶
Returns a ‘pretty print’ format of a biom file
WARNING: This method displays data values in a columnar format and can be misleading.
- getObservationIndex(obs_id)¶
Returns the obs index
- getSampleIndex(samp_id)¶
Returns the sample index
- getValueByIds(obs_id, samp_id)¶
Return the value in the matrix corresponding to (obs_id, samp_id)
- isEmpty()¶
Returns true if the table is empty
- iterObservationData()¶
Yields observation_values
- iterObservations(conv_to_np=True)¶
Yields (observation_value, observation_id, observation_metadata)
NOTE: will return None in observation_metadata positions if self.ObservationMetadata is set to None
- iterSampleData()¶
Yields sample_values
- iterSamples(conv_to_np=True)¶
Yields (sample_values, sample_id, sample_metadata)
NOTE: will return None in sample_metadata positions if self.SampleMetadata is set to None
- merge(other, Sample='union', Observation='union', merge_f=<built-in function add>, sample_metadata_f=<function prefer_self at 0x102adf2a8>, observation_metadata_f=<function prefer_self at 0x102adf2a8>)¶
Merge two tables together
The axes, samples and observations, can be controlled independently. Both can either work on ‘union’ or ‘intersection’.
merge_f is a function that takes two arguments and returns a value. The method is parameterized so that values can be added or subtracted where there is overlap in (sample_id, observation_id) values in the tables
sample_metadata_f and observation_metadata_f define how to merge metadata between tables. The default is to just keep the metadata associated to self if self has metadata otherwise take metadata from other. These functions are given both metadata dictsand must return a single metadata dict
NOTE: There is an implicit type conversion to float. Tables using strings as the type are not supported. No check is currently in place.
NOTE: The return type is always that of self
- nonzero()¶
Returns types of nonzero locations within the data matrix
The values returned are (observation_id, sample_id)
- normObservationByMetadata(obs_metadata_id)¶
Return new table with counts divided by specified metadata value
- normObservationBySample()¶
Return new table with relative abundance in each sample
- normSampleByObservation()¶
Return new table with relative abundance in each observation
- observationData(id_)¶
Return samples associated to a observation id
- observationExists(id_)¶
Returns True if observation exists, False otherwise
- reduce(f, axis)¶
Reduce over axis with f
axis can be either ‘sample’ or ‘observation’
- sampleData(id_, conv_to_np=False)¶
Return observations associated to a sample id
- sampleExists(id_)¶
Returns True if sample exists, False otherwise
- setValueByIds(obs_id, samp_id, val)¶
Set the value in the matrix corresponding to (obs_id, samp_id)
- sortByObservationId(sort_f=<function natsort at 0x102ad7320>)¶
Return a table sorted by sort_f
- sortBySampleId(sort_f=<function natsort at 0x102ad7320>)¶
Return a table sorted by sort_f
- sortObservationOrder(obs_order)¶
Return a new table in observation order
- sortSampleOrder(sample_order)¶
Return a new table in sample order
- sum(axis='whole')¶
Returns the sum by axis
axis can be:
‘whole’ : whole matrix sum ‘sample’ : return a vector with a sum for each sample ‘observation’ : return a vector with a sum for each observation
- transformObservations(f)¶
Apply a function to each observation
f is passed a numpy vector and must return a vector
- transformSamples(f)¶
Apply a function to each sample
f is passed a numpy vector and must return a vector
OTUTable¶
The OTUTable base class provides functionality specific for OTU tables. Currently, it only provides a static private member variable that describes its BIOM type. This object was stubbed out incase future methods are developed that do not make sense with the context of, say, an MG-RAST metagenomic abundance table. It is advised to always use an object that subclasses OTUTable if the analysis is on OTU data.
- class biom.table.OTUTable¶
OTU table abstract class
PathwayTable¶
A table type to represent gene pathways.
- class biom.table.PathwayTable¶
Pathway table abstract class
FunctionTable¶
A table type to represent gene functions.
- class biom.table.FunctionTable¶
Function table abstract class
OrthologTable¶
A table type to represent gene orthologs.
- class biom.table.OrthologTable¶
Ortholog table abstract class
Container classes¶
The container classes implement required private member variable interfaces as defined by the Table abstract base class. Specifically, these objects define the ways in which data is moved into and out of the contained data object. These are fully functional and usable objects, however they do not implement table type specifc functionality.
SparseTable¶
The subclass SparseTable can be derived for use with table data. This object implemented all of the required private interfaces specified by the Table base class. The object contains a _data private member variable that is an instance of biom.table.SparseDict. It is advised to used derived objects of SparseTable if the data being operated on is sparse.
- class biom.table.SparseTable(*args, **kwargs)¶
DenseTable¶
The DenseTable object fulfills all private member methods stubbed out by the Table base class. The dense table contains a private member variable that is an instance of numpy.array. The array object is a matrix that contains all values including zeros. It is advised to use this table only if the number of samples and observations is reasonble. Unfortunately, it isn’t reasonable to define reasonable in this context. However, if either the number of observations or the number of samples is > 1000, it would probably be a good idea to rely on a SparseTable.
- class biom.table.DenseTable(*args, **kwargs)¶
Table type objects¶
The table type objects define variables and methods specific to a table type. Under the majority of situations, these are the objects that should be instantiated.
DensePathwayTable¶
- class biom.table.DensePathwayTable(*args, **kwargs)¶
Instantiatable dense pathway table
SparsePathwayTable¶
- class biom.table.SparsePathwayTable(*args, **kwargs)¶
Instantiatable sparse pathway table
DenseFunctionTable¶
- class biom.table.DenseFunctionTable(*args, **kwargs)¶
Instantiatable dense function table
SparseFunctionable¶
- class biom.table.SparseFunctionTable(*args, **kwargs)¶
Instantiatable sparse function table
DenseOrthologTable¶
- class biom.table.DenseOrthologTable(*args, **kwargs)¶
Instantiatable dense ortholog table
SparseOrthologTable¶
- class biom.table.SparseOrthologTable(*args, **kwargs)¶
Instantiatable sparse ortholog table
SparseGeneTable¶
- class biom.table.SparseGeneTable(*args, **kwargs)¶
Instantiatable sparse gene table