biom-format.org

biom-format Table objects

«  The biom file format   ::   Contents

biom-format Table objects

The biom-format project provides rich Table objects to support use of the BIOM file format. The objects encapsulate matrix data (such as OTU counts) and abstract the interaction away from the programmer. This provides the immediate benefit of the programmer not having to worry about what the underlying data object is, and in turn allows for different data representations to be supported. Currently, biom-format supports a dense object built off of numpy.array (NumPy) and a sparse object built off of Python dictionaries.

biom-format table_factory method

Generally, construction of a Table subclass will be through the table_factory method. This method facilitates any necessary data conversions and supports a wide variety of input data types.

biom.table.table_factory(data, sample_ids, observation_ids, sample_metadata=None, observation_metadata=None, table_id=None, constructor=<class 'biom.table.SparseOTUTable'>, **kwargs)

Construct a table

Attempts to make ‘data’ sane with respect to the constructor type through various means of juggling. Data can be:

  • numpy.array
  • list of numpy.array vectors
  • SparseDict representation
  • dict representation
  • list of SparseDict representation vectors
  • list of lists of sparse values [[row, col, value], ...]
  • list of lists of dense values [[value, value, ...], ...]

Description of available Table objects

There are multiple objects available but some of them are unofficial abstract base classes (does not use the abc module for historical reasons). In practice, the objects used should be the derived Tables such as SparseOTUTable or DenseGeneTable.

Abstract base classes

Abstract base classes establish standard interfaces for subclassed types and provide common functionality for derived types.

Table

Table is a container object and an abstract base class that provides a common and required API for subclassed objects. Through the use of private interfaces, it is possible to create public methods that operate on the underlying datatype without having to implement each method in each subclass. For instance, Table.iterSamplesData will return a generator that always yields numpy.array vectors for each sample regardless of how the table data is actually stored. This functionality results from derived classes implementing private interfaces, such as Table._conv_to_np.

class biom.table.Table(Data, SampleIds, ObservationIds, SampleMetadata=None, ObservationMetadata=None, TableId=None, **kwargs)

Abstract base class for a what a Table is

addObservationMetadata(md)

Take a dict of metadata and add it to an observation.

{observation_id:{dict_of_metadata}}

addSampleMetadata(md)

Take a dict of metadata and add it to a sample.

{sample_id:{dict_of_metadata}}

binObservationsByMetadata(f)

Yields tables by metadata

f is given the sample metadata by row and must return what “bin” the sample is part of.

binSamplesByMetadata(f)

Yields tables by metadata

f is given the sample metadata by row and must return what “bin” the sample is part of.

collapseObservationsByMetadata(metadata_f, reduce_f=<built-in function add>, norm=True, min_group_size=2)

Collapse a table by observation metadata

Bin observations by metadata then collapse each bin into a single observation. Metadata for the collapsed observations are retained and can be referred to by the ObservationId from each observation within the bin.

collapseSamplesByMetadata(metadata_f, reduce_f=<built-in function add>, norm=True, min_group_size=2)

Collapse a table by sample metadata

Bin samples by metadata then collapse each bin into a single sample. Metadata for the collapsed samples are retained and can be referred to by the SampleId from each sample within the bin.

copy()

Returns a copy of the Table

delimitedSelf(delim='t', header_key=None, header_value=None, metadata_formatter=<type 'str'>)

Stringify self in a delimited form

Default str output for the Table is just row/col ids and table data without any metadata

If header_key is not None, try to pull out that key from observation metadata. If header_value is not None, use the header_value in the output.

metadata_formatter: a function which takes a metadata entry and
returns a formatted version that should be written to file
filterObservations(f, invert=False)

Filter observations in self based on f

f must accept three variables, the observation values, observation ids and observation metadata. The function must only return true or false.

filterSamples(f, invert=False)

Filter samples in self based on f

f must accept three variables, the sample values, sample IDs and sample metadata. The function must only return true or false.

getBiomFormatJsonString(generated_by)

Returns a JSON string representing the table in Biom format.

getBiomFormatObject(generated_by)

Returns a dictionary representing the table in Biom format.

This dictionary can then be easily converted into a JSON string for serialization.

generated_by - a string describing the software used to build the table

TODO: This method may be very inefficient in terms of memory usage, so it needs to be tested with several large tables to determine if optimizations are necessary or not (i.e. subclassing JSONEncoder, using generators, etc...).

getBiomFormatPrettyPrint(generated_by)

Returns a ‘pretty print’ format of a biom file

WARNING: This method displays data values in a columnar format and can be misleading.

getObservationIndex(obs_id)

Returns the obs index

getSampleIndex(samp_id)

Returns the sample index

getValueByIds(obs_id, samp_id)

Return the value in the matrix corresponding to (obs_id, samp_id)

isEmpty()

Returns true if the table is empty

iterObservationData()

Yields observation_values

iterObservations(conv_to_np=True)

Yields (observation_value, observation_id, observation_metadata)

NOTE: will return None in observation_metadata positions if self.ObservationMetadata is set to None

iterSampleData()

Yields sample_values

iterSamples(conv_to_np=True)

Yields (sample_values, sample_id, sample_metadata)

NOTE: will return None in sample_metadata positions if self.SampleMetadata is set to None

merge(other, Sample='union', Observation='union', merge_f=<built-in function add>, sample_metadata_f=<function prefer_self at 0x102adf2a8>, observation_metadata_f=<function prefer_self at 0x102adf2a8>)

Merge two tables together

The axes, samples and observations, can be controlled independently. Both can either work on ‘union’ or ‘intersection’.

merge_f is a function that takes two arguments and returns a value. The method is parameterized so that values can be added or subtracted where there is overlap in (sample_id, observation_id) values in the tables

sample_metadata_f and observation_metadata_f define how to merge metadata between tables. The default is to just keep the metadata associated to self if self has metadata otherwise take metadata from other. These functions are given both metadata dictsand must return a single metadata dict

NOTE: There is an implicit type conversion to float. Tables using strings as the type are not supported. No check is currently in place.

NOTE: The return type is always that of self

nonzero()

Returns types of nonzero locations within the data matrix

The values returned are (observation_id, sample_id)

normObservationByMetadata(obs_metadata_id)

Return new table with counts divided by specified metadata value

normObservationBySample()

Return new table with relative abundance in each sample

normSampleByObservation()

Return new table with relative abundance in each observation

observationData(id_)

Return samples associated to a observation id

observationExists(id_)

Returns True if observation exists, False otherwise

reduce(f, axis)

Reduce over axis with f

axis can be either ‘sample’ or ‘observation’

sampleData(id_, conv_to_np=False)

Return observations associated to a sample id

sampleExists(id_)

Returns True if sample exists, False otherwise

setValueByIds(obs_id, samp_id, val)

Set the value in the matrix corresponding to (obs_id, samp_id)

sortByObservationId(sort_f=<function natsort at 0x102ad7320>)

Return a table sorted by sort_f

sortBySampleId(sort_f=<function natsort at 0x102ad7320>)

Return a table sorted by sort_f

sortObservationOrder(obs_order)

Return a new table in observation order

sortSampleOrder(sample_order)

Return a new table in sample order

sum(axis='whole')

Returns the sum by axis

axis can be:

‘whole’ : whole matrix sum ‘sample’ : return a vector with a sum for each sample ‘observation’ : return a vector with a sum for each observation

transformObservations(f)

Apply a function to each observation

f is passed a numpy vector and must return a vector

transformSamples(f)

Apply a function to each sample

f is passed a numpy vector and must return a vector

OTUTable

The OTUTable base class provides functionality specific for OTU tables. Currently, it only provides a static private member variable that describes its BIOM type. This object was stubbed out incase future methods are developed that do not make sense with the context of, say, an MG-RAST metagenomic abundance table. It is advised to always use an object that subclasses OTUTable if the analysis is on OTU data.

class biom.table.OTUTable

OTU table abstract class

PathwayTable

A table type to represent gene pathways.

class biom.table.PathwayTable

Pathway table abstract class

FunctionTable

A table type to represent gene functions.

class biom.table.FunctionTable

Function table abstract class

OrthologTable

A table type to represent gene orthologs.

class biom.table.OrthologTable

Ortholog table abstract class

GeneTable

A table type to represent genes.

class biom.table.GeneTable

Gene table abstract class

MetaboliteTable

A table type to represent metabolite profiles.

class biom.table.MetaboliteTable

Metabolite table abstract class

TaxonTable

A table type to represent taxonomies.

class biom.table.TaxonTable

Taxon table abstract class

Container classes

The container classes implement required private member variable interfaces as defined by the Table abstract base class. Specifically, these objects define the ways in which data is moved into and out of the contained data object. These are fully functional and usable objects, however they do not implement table type specifc functionality.

SparseTable

The subclass SparseTable can be derived for use with table data. This object implemented all of the required private interfaces specified by the Table base class. The object contains a _data private member variable that is an instance of biom.table.SparseDict. It is advised to used derived objects of SparseTable if the data being operated on is sparse.

class biom.table.SparseTable(*args, **kwargs)

DenseTable

The DenseTable object fulfills all private member methods stubbed out by the Table base class. The dense table contains a private member variable that is an instance of numpy.array. The array object is a matrix that contains all values including zeros. It is advised to use this table only if the number of samples and observations is reasonble. Unfortunately, it isn’t reasonable to define reasonable in this context. However, if either the number of observations or the number of samples is > 1000, it would probably be a good idea to rely on a SparseTable.

class biom.table.DenseTable(*args, **kwargs)

Table type objects

The table type objects define variables and methods specific to a table type. Under the majority of situations, these are the objects that should be instantiated.

DenseOTUTable

class biom.table.DenseOTUTable(*args, **kwargs)

Instantiatable dense OTU table

SparseOTUTable

class biom.table.SparseOTUTable(*args, **kwargs)

Instantiatable sparse OTU table

DensePathwayTable

class biom.table.DensePathwayTable(*args, **kwargs)

Instantiatable dense pathway table

SparsePathwayTable

class biom.table.SparsePathwayTable(*args, **kwargs)

Instantiatable sparse pathway table

DenseFunctionTable

class biom.table.DenseFunctionTable(*args, **kwargs)

Instantiatable dense function table

SparseFunctionable

class biom.table.SparseFunctionTable(*args, **kwargs)

Instantiatable sparse function table

DenseOrthologTable

class biom.table.DenseOrthologTable(*args, **kwargs)

Instantiatable dense ortholog table

SparseOrthologTable

class biom.table.SparseOrthologTable(*args, **kwargs)

Instantiatable sparse ortholog table

DenseGeneTable

class biom.table.DenseGeneTable(*args, **kwargs)

Instantiatable dense gene table

SparseGeneTable

class biom.table.SparseGeneTable(*args, **kwargs)

Instantiatable sparse gene table

DenseMetaboliteTable

class biom.table.DenseMetaboliteTable(*args, **kwargs)

Instantiatable dense metabolite table

SparseMetaboliteTable

class biom.table.SparseMetaboliteTable(*args, **kwargs)

Instantiatable sparse metabolite table

«  The biom file format   ::   Contents