Implementing Metabolomics Analyses into Workflows: Towards Genome-Metabolome Large-Scale Data Fusion

Project Title: Implementing Metabolomics Analyses into Workflows: Towards Genome-Metabolome Large-Scale Data Fusion

Organisation: University of Birmingham
Applicants: Prof. Mark Viant, Prof. John Colbourne & Dr Robert Davidson
Project Duration: 01 April 2013 – 01 October 2013
NERC Reference: NE/K011294/1

Summary of proposed research: “Genomic and post-genomic studies are transforming our mechanistic understanding of organism-environment interactions.” While this statement is certainly true, it masks many of the major challenges that have had to be overcome during the last decade. Today, genomics approaches are widely used by researchers from across the breadth of NERC science, utilising established (and ever cheaper) technologies and analysis pipelines, and delivering high impact publications. The same cannot yet be said for metabolomics, which is a considerably less mature approach, both analytically and computationally. The analytical challenges in metabolomics have restricted its use to experts of analytical chemistry, while the computational challenges have restricted the knowledge that can be mined from these rich datasets. Here we address the latter point, drawing from the wisdom and experience of genomics researchers.

One of the reasons for the success of environmental genomics is that biologists, without an in-depth knowledge of biostatistics and programming, have been able to construct and execute Next Gen Sequencing (NGS) data analyses using standardised workflows. Galaxy (http://galaxyproject.org/) – headlined as “Online bioinformatics analysis for everyone” – has emerged as the leading open-source workflow platform for NGS data analysis, with many standard processing tools accessible from its Web-based user interface. This workflow software is also being applied successfully to proteomics and chemo-informatics. Researchers at BGI (Beijing Genomics Institute) in China, our Project Partner on this application, have considerable expertise in Galaxy, since this web-based data analysis and workflow system forms the basis of its data analysis platform. They also have close links with the Galaxy development team.

We propose to ‘hop’ Dr Davidson from Professor Viant’s environmental metabolomics laboratory and NBAF-B at the University of Birmingham into a computational laboratory at BG|-Hong Kong. Here he will gain specialist expertise in Galaxy workflows and implement our existing metabolomics pipelines into Galaxy. This is an extremely important step towards making metabolomics analysis pipelines more effective (by integrating powerful algorithms from the ever growing toolbox of metabolomics analysis methods), more standardised (enabling greater cross comparison of results from different studies), and considerably more accessible to biologists. Our aim is for both data and analysis tools to be accessible from a software platform that provides a single, user-friendly interface for developing computational pipelines in a form that can be shared and reused by the environmental community.