<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="6.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Huttenhower, Curtis</style></author><author><style face="normal" font="default" size="100%">Hibbs, Matthew A</style></author><author><style face="normal" font="default" size="100%">Myers, Chad L</style></author><author><style face="normal" font="default" size="100%">Caudy, Amy A</style></author><author><style face="normal" font="default" size="100%">Hess, David C</style></author><author><style face="normal" font="default" size="100%">Troyanskaya, Olga G</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">The impact of incomplete knowledge on evaluation: an experimental benchmark for protein function prediction.</style></title><secondary-title><style face="normal" font="default" size="100%">Bioinformatics (Oxford, England)</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Bioinformatics</style></alt-title></titles><dates><year><style  face="normal" font="default" size="100%">2009</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2009 Sep 15</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">25</style></volume><pages><style face="normal" font="default" size="100%">2404-10</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">MOTIVATION: Rapidly expanding repositories of highly informative genomic data have generated increasing interest in methods for protein function prediction and inference of biological networks. The successful application of supervised machine learning to these tasks requires a gold standard for protein function: a trusted set of correct examples, which can be used to assess performance through cross-validation or other statistical approaches. Since gene annotation is incomplete for even the best studied model organisms, the biological reliability of such evaluations may be called into question. RESULTS: We address this concern by constructing and analyzing an experimentally based gold standard through comprehensive validation of protein function predictions for mitochondrion biogenesis in Saccharomyces cerevisiae. Specifically, we determine that (i) current machine learning approaches are able to generalize and predict novel biology from an incomplete gold standard and (ii) incomplete functional annotations adversely affect the evaluation of machine learning performance. While computational approaches performed better than predicted in the face of incomplete data, relative comparison of competing approaches-even those employing the same training data-is problematic with a sparse gold standard. Incomplete knowledge causes individual methods' performances to be differentially underestimated, resulting in misleading performance evaluations. We provide a benchmark gold standard for yeast mitochondria to complement current databases and an analysis of our experimental results in the hopes of mitigating these effects in future comparative evaluations. AVAILABILITY: The mitochondrial benchmark gold standard, as well as experimental results and additional data, is available at http://function.princeton.edu/mitochondria.</style></abstract><issue><style face="normal" font="default" size="100%">18</style></issue><custom1><style face="normal" font="default" size="100%">http://www.ncbi.nlm.nih.gov/pubmed/19561015?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="6.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Hess, David C</style></author><author><style face="normal" font="default" size="100%">Myers, Chad L</style></author><author><style face="normal" font="default" size="100%">Huttenhower, Curtis</style></author><author><style face="normal" font="default" size="100%">Hibbs, Matthew A</style></author><author><style face="normal" font="default" size="100%">Hayes, Alicia P</style></author><author><style face="normal" font="default" size="100%">Paw, Jadine</style></author><author><style face="normal" font="default" size="100%">Clore, John J</style></author><author><style face="normal" font="default" size="100%">Mendoza, Rosa M</style></author><author><style face="normal" font="default" size="100%">Luis, Bryan San</style></author><author><style face="normal" font="default" size="100%">Nislow, Corey</style></author><author><style face="normal" font="default" size="100%">Giaever, Guri</style></author><author><style face="normal" font="default" size="100%">Costanzo, Michael</style></author><author><style face="normal" font="default" size="100%">Troyanskaya, Olga G</style></author><author><style face="normal" font="default" size="100%">Caudy, Amy A</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Computationally driven, quantitative experiments discover genes required for mitochondrial biogenesis.</style></title><secondary-title><style face="normal" font="default" size="100%">PLoS genetics</style></secondary-title><alt-title><style face="normal" font="default" size="100%">PLoS Genet.</style></alt-title></titles><dates><year><style  face="normal" font="default" size="100%">2009</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2009 Mar</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">5</style></volume><pages><style face="normal" font="default" size="100%">e1000407</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">Mitochondria are central to many cellular processes including respiration, ion homeostasis, and apoptosis. Using computational predictions combined with traditional quantitative experiments, we have identified 100 proteins whose deficiency alters mitochondrial biogenesis and inheritance in Saccharomyces cerevisiae. In addition, we used computational predictions to perform targeted double-mutant analysis detecting another nine genes with synthetic defects in mitochondrial biogenesis. This represents an increase of about 25% over previously known participants. Nearly half of these newly characterized proteins are conserved in mammals, including several orthologs known to be involved in human disease. Mutations in many of these genes demonstrate statistically significant mitochondrial transmission phenotypes more subtle than could be detected by traditional genetic screens or high-throughput techniques, and 47 have not been previously localized to mitochondria. We further characterized a subset of these genes using growth profiling and dual immunofluorescence, which identified genes specifically required for aerobic respiration and an uncharacterized cytoplasmic protein required for normal mitochondrial motility. Our results demonstrate that by leveraging computational analysis to direct quantitative experimental assays, we have characterized mutants with subtle mitochondrial defects whose phenotypes were undetected by high-throughput methods.</style></abstract><issue><style face="normal" font="default" size="100%">3</style></issue><custom1><style face="normal" font="default" size="100%">http://www.ncbi.nlm.nih.gov/pubmed/19300474?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="6.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Hibbs, Matthew A</style></author><author><style face="normal" font="default" size="100%">Myers, Chad L</style></author><author><style face="normal" font="default" size="100%">Huttenhower, Curtis</style></author><author><style face="normal" font="default" size="100%">Hess, David C</style></author><author><style face="normal" font="default" size="100%">Li, Kai</style></author><author><style face="normal" font="default" size="100%">Caudy, Amy A</style></author><author><style face="normal" font="default" size="100%">Troyanskaya, Olga G</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Directing experimental biology: a case study in mitochondrial biogenesis.</style></title><secondary-title><style face="normal" font="default" size="100%">PLoS computational biology</style></secondary-title><alt-title><style face="normal" font="default" size="100%">PLoS Comput. Biol.</style></alt-title></titles><dates><year><style  face="normal" font="default" size="100%">2009</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2009 Mar</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">5</style></volume><pages><style face="normal" font="default" size="100%">e1000322</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">Computational approaches have promised to organize collections of functional genomics data into testable predictions of gene and protein involvement in biological processes and pathways. However, few such predictions have been experimentally validated on a large scale, leaving many bioinformatic methods unproven and underutilized in the biology community. Further, it remains unclear what biological concerns should be taken into account when using computational methods to drive real-world experimental efforts. To investigate these concerns and to establish the utility of computational predictions of gene function, we experimentally tested hundreds of predictions generated from an ensemble of three complementary methods for the process of mitochondrial organization and biogenesis in Saccharomyces cerevisiae. The biological data with respect to the mitochondria are presented in a companion manuscript published in PLoS Genetics (doi:10.1371/journal.pgen.1000407). Here we analyze and explore the results of this study that are broadly applicable for computationalists applying gene function prediction techniques, including a new experimental comparison with 48 genes representing the genomic background. Our study leads to several conclusions that are important to consider when driving laboratory investigations using computational prediction approaches. While most genes in yeast are already known to participate in at least one biological process, we confirm that genes with known functions can still be strong candidates for annotation of additional gene functions. We find that different analysis techniques and different underlying data can both greatly affect the types of functional predictions produced by computational methods. This diversity allows an ensemble of techniques to substantially broaden the biological scope and breadth of predictions. We also find that performing prediction and validation steps iteratively allows us to more completely characterize a biological area of interest. While this study focused on a specific functional area in yeast, many of these observations may be useful in the contexts of other processes and organisms.</style></abstract><issue><style face="normal" font="default" size="100%">3</style></issue><custom1><style face="normal" font="default" size="100%">http://www.ncbi.nlm.nih.gov/pubmed/19300515?dopt=Abstract</style></custom1></record><record><source-app name="Biblio" version="6.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Hibbs, Matthew A</style></author><author><style face="normal" font="default" size="100%">Hess, David C</style></author><author><style face="normal" font="default" size="100%">Myers, Chad L</style></author><author><style face="normal" font="default" size="100%">Huttenhower, Curtis</style></author><author><style face="normal" font="default" size="100%">Li, Kai</style></author><author><style face="normal" font="default" size="100%">Troyanskaya, Olga G</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">Exploring the functional landscape of gene expression: directed search of large microarray compendia.</style></title><secondary-title><style face="normal" font="default" size="100%">Bioinformatics (Oxford, England)</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Bioinformatics</style></alt-title></titles><dates><year><style  face="normal" font="default" size="100%">2007</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2007 Oct 15</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">23</style></volume><pages><style face="normal" font="default" size="100%">2692-9</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">MOTIVATION: The increasing availability of gene expression microarray technology has resulted in the publication of thousands of microarray gene expression datasets investigating various biological conditions. This vast repository is still underutilized due to the lack of methods for fast, accurate exploration of the entire compendium. RESULTS: We have collected Saccharomyces cerevisiae gene expression microarray data containing roughly 2400 experimental conditions. We analyzed the functional coverage of this collection and we designed a context-sensitive search algorithm for rapid exploration of the compendium. A researcher using our system provides a small set of query genes to establish a biological search context; based on this query, we weight each dataset's relevance to the context, and within these weighted datasets we identify additional genes that are co-expressed with the query set. Our method exhibits an average increase in accuracy of 273% compared to previous mega-clustering approaches when recapitulating known biology. Further, we find that our search paradigm identifies novel biological predictions that can be verified through further experimentation. Our methodology provides the ability for biological researchers to explore the totality of existing microarray data in a manner useful for drawing conclusions and formulating hypotheses, which we believe is invaluable for the research community. AVAILABILITY: Our query-driven search engine, called SPELL, is available at http://function.princeton.edu/SPELL. SUPPLEMENTARY INFORMATION: Several additional data files, figures and discussions are available at http://function.princeton.edu/SPELL/supplement.</style></abstract><issue><style face="normal" font="default" size="100%">20</style></issue><custom1><style face="normal" font="default" size="100%">http://www.ncbi.nlm.nih.gov/pubmed/17724061?dopt=Abstract</style></custom1></record></records></xml>