Event

February 22-25, 2015 Tysons, Virginia

Genomic Sciences Program (GSP) 2015

Contractors-Grantees Meeting XIII
Several additional software components were developed to overcome technical barriers that arose during this project. Originally implemented as command-line utilities for vocabulary extraction, annotation and document analysis, we have developed the individual software components into a set of libraries for text mining, information extraction, document classification and terminology development. The Semantic Desktop (above) is a Java Application based on those libraries, and the components may alternatively be deployed in a web service container or integrated with third party software. The above screenshot is part of a commercial case study using the Fairview Research Alexandria Patent Database, where we demonstrate the ability to reverse-engineer the logic that human indexers use to classify large corpora of technical documents, and to measure both the quality of previously-annotated documents and the cohesion of individual document classifications.
Several additional software components were developed to overcome technical barriers that arose during this project. Originally implemented as command-line utilities for vocabulary extraction, annotation and document analysis, we have developed the individual software components into a set of libraries for text mining, information extraction, document classification and terminology development. The Semantic Desktop (above) is a Java Application based on those libraries, and the components may alternatively be deployed in a web service container or integrated with third party software. The above screenshot is part of a commercial case study using the Fairview Research Alexandria Patent Database, where we demonstrate the ability to reverse-engineer the logic that human indexers use to classify large corpora of technical documents, and to measure both the quality of previously-annotated documents and the cohesion of individual document classifications.

Charles Parker and George Garrity will be presenting poster 222 (“Semantic Index of Phenotypic and Genotypic Data”, Abstract Book, page 333) during Tuesday evening’s mixer (5:00pm-7:00pm) in Tyson’s Ballroom. We will be highlighting our team’s recent work on Knowledge Extraction from scientific literature.

Our core technical objectives are to: (1) build a database of normalized phenotypic descriptions using the primary taxonomic literature of bacterial and archaeal type strains, (2) construct an ontology capable of making accurate phenotypic and environmental inferences based on that data, and (3) improve the visibility and accessibility of publicly-available research data.

This project is tightly coupled with ongoing DOE projects (the Genomic Encyclopedia of Bacteria and Archaea, the Microbial Earth Project, the Community Science Program) and with two key publications, Standards in Genomic Sciences (SIGS) and the International Journal of Systematic and Evolutionary Microbiology (IJSEM).

The scope of this project covers many technical fields, including text-mining, Information Extraction, Natural Language Processing, indexing & search, terminology & ontology development, machine reasoning, semantic analysis, sequence analysis and taxonomic classification.

Parker et al., “Semantic Index of Phenotypic and Genotypic Data

Download Abstract (48kB PDF) Download Poster (7.4MB PDF)

[permalink] Posted February 20, 2015.

Back to top