Workflow approaches to investigation of biological complexity
Prepared by: Prof T R Meagher, Dr M C Rivers and Prof P Kille
Many scientific challenges in environmental ‘omics entail diverse data inputs from multiple sources as well as across multiple levels of biological organisation. Thus, research in environmental ‘omics requires novel approaches that are inherently equipped to deal with data diversity and complexity.
The scientific workflow approach provides the potential to address the biological complexity inherent to environmental omics. A scientific workflow is a series of linked components, such as data computation, manipulation and analysis, used in scientific problem-solving. Scientific workflows help to visualise the flow of data through (potentially) complex computations in a user-friendly way.
Biological systems can manifest complexity in many ways. For example, integration of genomic information across multiple species in a particular ecosystem, now feasible due to advances in genome sequencing technology, can be coupled with information about functional domains within genomes — and their context dependent properties within specific organisms — to match genomic information to larger scale processes of adaptation and ecosystem function. One of the challenges of integration across multiple levels is that information available at each level is dynamic and also subject to changing biological interpretation, such as changing understanding of genome function or changing taxonomic classification that refines knowledge of the evolutionary history of the system. Viewing this complexity through a workflow approach allows not only identification and interpretation of higher-order emergent properties of complex systems, but also the ability to integrate models that take into account changing information.
In this first workshop of the STFC/NERC Bioinformatics & Environmental ‘Omics Network, we will be exploring the nature of workflow approaches and how they can be applied to specific scientific challenges in the realm of ‘omics. Future workshops will address more targeted scientific themes, building on methodological approaches considered in this inaugural workshop.
- Technical challenges
- Computational/algorithmic capacity and applications
- Data ontologies – communication between different data sources
- Limitations? Computer power versus parameter scanning & optimisation?
- How can one automate established practice?
- Scientific challenges
- Bioinformatics and taxonomy
- Environmental genomics
- Function-oriented analysis of biological diversity quality
- Interaction challenges
- Matching scientific expertise with technical expertise
- How much is dependent on human expert versus algorithmic interfaces?
- How does trial and error become disentangled from final outcome, e.g. what happens to intermediate steps and output “dead ends”?
DAY 1 (11:00 AM start, coffee from 10:30)
11:00-11:30 Introduction (30 minutes)
Tom Meagher & Pete Kille – Initial plenary session to outline the planned (work)flow for the workshop and intended trajectory towards a workshop product. Technologies will be highlighted and scientific and technical issues will be framed. (30 minutes)
11:30-12:30 Scientific challenges (1 hour)
- Alfried Vogler – Workflows for data mining in phylogenetics
- Melody Clark – Data challenges in ecology and evolution
- Philipp Antczak – Functional biology workflows: from tools to integrated platforms
12:30-13:30 Lunch (1 hour)
13:30-14:30 Breakout group discussion guided by challenge questions (1 hour)
14:30-15:00 Plenary reporting session (30 minutes)
15:00-15:30 Break (30 minutes)
15:30-16:30 Technical challenges (1 hour)
- Alex Hardisty – BioVel: An e-Infrastructure and e-Science environment supporting research on biodiversity
- Aleksandra Pawlik – Taverna workflows: provenance and reproducibility
- Brian Matthews – Supporting scientific processes in National Facilities
16:30-17:30 Breakout group discussion guided by challenge questions (1 hour)
17:30-18:00 Plenary reporting session (30 minutes)
09:00-09:30 Plenary recap of emergent topics from day 1 (30 minutes)
09:30-10:00 Interaction challenges. Plenary session to facilitate auto-formation of cognate subgroups (30 minutes)
10:00-10:30 Coffee (30 minutes)
10:30-12:00 Breakout into cognate subgroups to develop potential research agendas to address emergent scientific opportunities for utilisation of the workflow approach (1 hour 30 minutes)
12:00-12:30 Plenary for presentation and feedback (30 minutes)
12:30-13:00 Wrap up and next steps (30 minutes)
13:00 Lunch and departure
Resources and bibliography:
Curcin, V. & Ghanem, M. (2008) Scientiﬁc workﬂow systems – can one size ﬁt all? Biomedical Engineering Conference, 2008. CIBEC 2008. Cairo International. pp.1-9 http://www.doc.ic.ac.uk/~vc100/papers/Scientific_workflow_systems.pdf
Gil, Y., Deelman, E., et al. (2007) Examining the Challenges of Scientific Workflows Computer 40(12): 24-32
Jones, M.B., Schildhauer, M.P., Reichman, O.J., Bowers, S. (2006) The New Bioinformatics: Integrating Ecological Data from the Gene to the Biosphere. Annu. Rev. Ecol. Evol. Syst. 37: 519–44 http://www.pnamp.org/sites/default/files/Jones2006_AREES.pdf
Peleg, M., Yeh, I., Altman, R. (2002) Modelling biological processes using workflow and Petri Net models. Bioinformatics 18(6): 825-837. http://bioinformatics.oxfordjournals.org/content/18/6/825.full.pdf http://bioinformatics.oxfordjournals.org/content/18/6/825.full.pdf
Reichman,O.J., Jones, M.B., Schildhauer, M.P. (2011) Challenges and Opportunities of Open Data in Ecology. Science 331: 703-705 http://www.planta.cn/forum/files_planta/challenges_and_opportunities_of_open_data_in_ecology_121.pdf
Examples of scientific workflow management systems.
|Antczak, Philipp||University of Liverpool||P.Antczak@liverpool.ac.uk|
|Clark, Melody||British Antarctic Surveyfirstname.lastname@example.org|
|Falciani, Francesco||University of Liverpoolemail@example.com|
|Galay-Burgos, Malyka||European Centre for Ecotoxicology and Toxicology of Chemicals (ECETOC)||firstname.lastname@example.org|
|Grainger, Alan||University of Leeds||A.Grainger@leeds.ac.uk|
|Hallinan, Jennifer||Newcastle Universityemail@example.com|
|Hardisty, Alex||Cardiff University||HardistyAR@cardiff.ac.uk|
|Kille, Peter||Cardiff Universityfirstname.lastname@example.org|
|Linard, Benjamin||Imperial Collegeemail@example.com|
|Meagher, Thomas||University of St Andrewsfirstname.lastname@example.org|
|Nic Lughadha, Eimear||Royal Botanic Gardens, Kew||E.NicLughadha@kew.org|
|Pascoe, Stephen||Centre of Environmental Data (CEDA)||Stephen.Pascoe@stfc.ac.uk|
|Pawlik, Aleksandra||Software Sustainability Instituteemail@example.com|
|Rivers, Malin||University of St Andrews / Royal Botanic Gardens, Kewfirstname.lastname@example.org|
|Scheremetjev, Maxim||European Bioinformatics Institute (EBI)||email@example.com|
|Simmons, Michael||University of Cambridgefirstname.lastname@example.org|
|Sreenivasaprasad, Prasad||University of Bedfordshireemail@example.com|
|Vogler, Alfried||NHM London/ Imperial Collegefirstname.lastname@example.org|
|Watson, Mick||Roslin Instituteemail@example.com|