Developing MicroPIE and a Microbial Ontology
The study of the evolution of microbial traits requires both phylogenetic as well as phenotypic trait information (also called phenomics). Next generation sequencing has enable high throughput (meta)genomic analyses, but collecting phenotypic information, either de novo or from published taxonomic literature, to create character matrices is still tedious and time-consuming. I am part of a team of researchers developing tools to provide faster collection of microbial phenomic information from published literature. We have created a natural language processing tool, Microbial Phenomics Information Extractor, or MicroPIE, that uses existing parsers, machine-learning tools, and a library of microbial-specific terms derived from ~1000 taxonomic descriptions from the Archaea, Bacteroidetes, Cyanobacteria, and Mollicutes. We have also developed an ontology of terms found in prokaryotic taxonomic descriptions, that is organized using a formal logical framework. This ontology will be used to assist MicroPIE in character identification and extraction, facilitate the identification of trait synonyms used in prokaryotic taxonomic descriptions, and to populate character matrices with higher-level character states. The taxon-character matrices extracted using MicroPIE can be combined with phylogenomic trees and analyzed using the Arbor software package, which is a scalable, web-services based platform for conducting phylogenetic comparative analyses to test evolutionary hypotheses. I’ll show some preliminary results from an analysis of trait evolution in cyanobacteria.