biom-format Table objects

«  The biom file format   ::   Contents

biom-format Table objects

The biom-format project provides rich Table objects to support use of the BIOM file format. The objects encapsulate matrix data (such as OTU counts) and abstract the interaction away from the programmer. This provides the immediate benefit of the programmer not having to worry about what the underlying data object is, and in turn allows for different data representations to be supported. Currently, biom-format supports a dense object built off of numpy.array (NumPy) and a sparse object built off of Python dictionaries.

biom-format table_factory method

Generally, construction of a Table subclass will be through the table_factory method. This method facilitates any necessary data conversions and supports a wide variety of input data types.

biom.table.table_factory(data, sample_ids, observation_ids, sample_metadata=None, observation_metadata=None, table_id=None, constructor=<class 'biom.table.SparseOTUTable'>, **kwargs)

Construct a table

Attempts to make ‘data’ sane with respect to the constructor type through various means of juggling. Data can be:

  • numpy.array
  • list of numpy.array vectors
  • SparseDict representation
  • dict representation
  • list of SparseDict representation vectors
  • list of lists of sparse values [[row, col, value], ...]
  • list of lists of dense values [[value, value, ...], ...]

Description of available Table objects

There are multiple objects available but some of them are unofficial abstract base classes (does not use the abc module for historical reasons). In practice, the objects used should be the derived Tables such as SparseOTUTable or DenseGeneTable.

Abstract base classes

Abstract base classes establish standard interfaces for subclassed types and provide common functionality for derived types.


Table is a container object and an abstract base class that provides a common and required API for subclassed objects. Through the use of private interfaces, it is possible to create public methods that operate on the underlying datatype without having to implement each method in each subclass. For instance, Table.iterSamplesData will return a generator that always yields numpy.array vectors for each sample regardless of how the table data is actually stored. This functionality results from derived classes implementing private interfaces, such as Table._conv_to_np.

class biom.table.Table(Data, SampleIds, ObservationIds, SampleMetadata=None, ObservationMetadata=None, TableId=None, **kwargs)

Abstract base class for a what a Table is


Take a dict of metadata and add it to an observation.



Take a dict of metadata and add it to a sample.



Yields tables by metadata

f is given the sample metadata by row and must return what “bin” the sample is part of.


Yields tables by metadata

f is given the sample metadata by row and must return what “bin” the sample is part of.

collapseObservationsByMetadata(metadata_f, reduce_f=<built-in function add>, norm=True, min_group_size=2)

Collapse a table by observation metadata

Bin observations by metadata then collapse each bin into a single observation. Metadata for the collapsed observations are retained and can be referred to by the ObservationId from each observation within the bin.

collapseSamplesByMetadata(metadata_f, reduce_f=<built-in function add>, norm=True, min_group_size=2)

Collapse a table by sample metadata

Bin samples by metadata then collapse each bin into a single sample. Metadata for the collapsed samples are retained and can be referred to by the SampleId from each sample within the bin.


Returns a copy of the Table

delimitedSelf(delim='t', header_key=None, header_value=None, metadata_formatter=<type 'str'>)

Stringify self in a delimited form

Default str output for the Table is just row/col ids and table data without any metadata

If header_key is not None, try to pull out that key from observation metadata. If header_value is not None, use the header_value in the output.

metadata_formatter: a function which takes a metadata entry and
returns a formatted version that should be written to file
filterObservations(f, invert=False)

Filter observations in self based on f

f must accept three variables, the observation values, observation ids and observation metadata. The function must only return true or false.

filterSamples(f, invert=False)

Filter samples in self based on f

f must accept three variables, the sample values, sample IDs and sample metadata. The function must only return true or false.


Returns a JSON string representing the table in Biom format.


Returns a dictionary representing the table in Biom format.

This dictionary can then be easily converted into a JSON string for serialization.

generated_by - a string describing the software used to build the table

TODO: This method may be very inefficient in terms of memory usage, so it needs to be tested with several large tables to determine if optimizations are necessary or not (i.e. subclassing JSONEncoder, using generators, etc...).


Returns a ‘pretty print’ format of a biom file

WARNING: This method displays data values in a columnar format and can be misleading.


Returns the obs index


Returns the sample index

getValueByIds(obs_id, samp_id)

Return the value in the matrix corresponding to (obs_id, samp_id)


Returns true if the table is empty


Yields observation_values


Yields (observation_value, observation_id, observation_metadata)

NOTE: will return None in observation_metadata positions if self.ObservationMetadata is set to None


Yields sample_values


Yields (sample_values, sample_id, sample_metadata)

NOTE: will return None in sample_metadata positions if self.SampleMetadata is set to None

merge(other, Sample='union', Observation='union', merge_f=<built-in function add>, sample_metadata_f=<function prefer_self at 0x102adf2a8>, observation_metadata_f=<function prefer_self at 0x102adf2a8>)

Merge two tables together

The axes, samples and observations, can be controlled independently. Both can either work on ‘union’ or ‘intersection’.

merge_f is a function that takes two arguments and returns a value. The method is parameterized so that values can be added or subtracted where there is overlap in (sample_id, observation_id) values in the tables

sample_metadata_f and observation_metadata_f define how to merge metadata between tables. The default is to just keep the metadata associated to self if self has metadata otherwise take metadata from other. These functions are given both metadata dictsand must return a single metadata dict

NOTE: There is an implicit type conversion to float. Tables using strings as the type are not supported. No check is currently in place.

NOTE: The return type is always that of self


Returns types of nonzero locations within the data matrix

The values returned are (observation_id, sample_id)


Return new table with counts divided by specified metadata value


Return new table with relative abundance in each sample


Return new table with relative abundance in each observation


Return samples associated to a observation id


Returns True if observation exists, False otherwise

reduce(f, axis)

Reduce over axis with f

axis can be either ‘sample’ or ‘observation’

sampleData(id_, conv_to_np=False)

Return observations associated to a sample id


Returns True if sample exists, False otherwise

setValueByIds(obs_id, samp_id, val)

Set the value in the matrix corresponding to (obs_id, samp_id)

sortByObservationId(sort_f=<function natsort at 0x102ad7320>)

Return a table sorted by sort_f

sortBySampleId(sort_f=<function natsort at 0x102ad7320>)

Return a table sorted by sort_f


Return a new table in observation order


Return a new table in sample order


Returns the sum by axis

axis can be:

‘whole’ : whole matrix sum ‘sample’ : return a vector with a sum for each sample ‘observation’ : return a vector with a sum for each observation


Apply a function to each observation

f is passed a numpy vector and must return a vector


Apply a function to each sample

f is passed a numpy vector and must return a vector


The OTUTable base class provides functionality specific for OTU tables. Currently, it only provides a static private member variable that describes its BIOM type. This object was stubbed out incase future methods are developed that do not make sense with the context of, say, an MG-RAST metagenomic abundance table. It is advised to always use an object that subclasses OTUTable if the analysis is on OTU data.

class biom.table.OTUTable

OTU table abstract class


A table type to represent gene pathways.

class biom.table.PathwayTable

Pathway table abstract class


A table type to represent gene functions.

class biom.table.FunctionTable

Function table abstract class


A table type to represent gene orthologs.

class biom.table.OrthologTable

Ortholog table abstract class


A table type to represent genes.

class biom.table.GeneTable

Gene table abstract class


A table type to represent metabolite profiles.

class biom.table.MetaboliteTable

Metabolite table abstract class


A table type to represent taxonomies.

class biom.table.TaxonTable

Taxon table abstract class

Container classes

The container classes implement required private member variable interfaces as defined by the Table abstract base class. Specifically, these objects define the ways in which data is moved into and out of the contained data object. These are fully functional and usable objects, however they do not implement table type specifc functionality.


The subclass SparseTable can be derived for use with table data. This object implemented all of the required private interfaces specified by the Table base class. The object contains a _data private member variable that is an instance of biom.table.SparseDict. It is advised to used derived objects of SparseTable if the data being operated on is sparse.

class biom.table.SparseTable(*args, **kwargs)


The DenseTable object fulfills all private member methods stubbed out by the Table base class. The dense table contains a private member variable that is an instance of numpy.array. The array object is a matrix that contains all values including zeros. It is advised to use this table only if the number of samples and observations is reasonble. Unfortunately, it isn’t reasonable to define reasonable in this context. However, if either the number of observations or the number of samples is > 1000, it would probably be a good idea to rely on a SparseTable.

class biom.table.DenseTable(*args, **kwargs)

Table type objects

The table type objects define variables and methods specific to a table type. Under the majority of situations, these are the objects that should be instantiated.


class biom.table.DenseOTUTable(*args, **kwargs)

Instantiatable dense OTU table


class biom.table.SparseOTUTable(*args, **kwargs)

Instantiatable sparse OTU table


class biom.table.DensePathwayTable(*args, **kwargs)

Instantiatable dense pathway table


class biom.table.SparsePathwayTable(*args, **kwargs)

Instantiatable sparse pathway table


class biom.table.DenseFunctionTable(*args, **kwargs)

Instantiatable dense function table


class biom.table.SparseFunctionTable(*args, **kwargs)

Instantiatable sparse function table


class biom.table.DenseOrthologTable(*args, **kwargs)

Instantiatable dense ortholog table


class biom.table.SparseOrthologTable(*args, **kwargs)

Instantiatable sparse ortholog table


class biom.table.DenseGeneTable(*args, **kwargs)

Instantiatable dense gene table


class biom.table.SparseGeneTable(*args, **kwargs)

Instantiatable sparse gene table


class biom.table.DenseMetaboliteTable(*args, **kwargs)

Instantiatable dense metabolite table


class biom.table.SparseMetaboliteTable(*args, **kwargs)

Instantiatable sparse metabolite table

«  The biom file format   ::   Contents