Data Science and Bioinformatics explained

Bioinformatics involves using computational tools to acquire, store, organize, archive, analyze, and visualize biological data—including medical, behavioral, associated environmental and health data. It also involves the interpretation of these computational analyses. Bioinformatics has a strong overlap with the fields of computational biology and statistics. Computational biology is the development of computational tools (including algorithms and software) for studying biology. While bioinformatics is more the application and interpretation of these tools.

Interdisciplinary by design

Bioinformatics is interdisciplinary in nature due to the type of computational tools that are involved, which typically combine methods from computer science (e.g., software engineering, databases, machine learning), mathematics (statistics), and engineering (modeling).

The term "Bioinformatician" is a general label for biologists with training in computer science, mathematics or engineering - but it also applies to non-biologists using techniques from their respective field to address biology problems. As such, "bioinformatician" covers a broad spectrum of scientists from various fields, all united by their interest in solving biological problems.

How Data Science and Bioinformatics enhance research

Bioinformatics is about scaling a research project's size, scope and impact. Bioinformatics can help investigators design experiments to maximize experimental outcomes in terms of data quality and quantity. Bioinformatics can also streamline and expand the scope of analyses that an investigator could perform on this data. For example, computational tools can be designed to process large amounts of information quickly and look for subtle correlations and patterns in large, multidimensional datasets. Additionally, sophisticated tools and techniques borrowed from other fields (machine learning, mathematical models, etc.) expand the set of hypotheses an investigator can assess.

Montana INBRE's Impact on Data Science and Bioinformatics in Montana

MT INBRE has been hard at work creating and maintaining Bioinformatics resources throughout Montana. Between 2004-2014, the first and second iterations of MT INBRE Bioinformatics Core made great strides in advancing Bioinformatics education, tools, resources and culture throughout the state. An overview of our accomplishments include:

  • Developing formal courses to provide students with well-rounded instruction in bioinformatics, including algorithm development, open-source bioinformatics tools, and specialized bioinformatics tasks; Courses include:
    • Computational Biology
    • Genome Analysis
    • Advanced Bioinformatics
    • Introduction to Programming using Python and BioPython
    • Microbial Physiology
    • Research Methods in Molecular Microbilogy
    • Advanced Genetics
  • Funding the Bioinformatics Teaching and Research Lab at MSU, including computational resources, advanced courses for students, and informal training opportunities
  • Creating a 130-CPU computer cluster at MSU composed of a ten terabyte server and 21 quad-core research work stations that were then distributed to laboratories in seven departments and centers on the MSU campus.
  • Creating opportunities for investigators and students to access a larger Systems Biology Computational Cluster
  • Forming the Bioinformatics Users Group (BUG) to improve interdisciplinary nature of bioinformatics and to increase communication among researchers in Wyoming, Alaska, Colorado, Montana and Washington
  • Supported several core facilities and labs around Montana, including:

Research scope

Our project portfolio covers a wide set of sub-specialties of biology, including but not limited to molecular biology, cellular biology, infectious diseases, ecology, biochemistry, plant and animal science. The Bioinformatics Core has the expertise to process genomics (genome assembly and annotation, metagenomics, comparative genomics, SNP calling, phylogenetics), transcriptomics (microarrays, RNA-seq, metatranscriptomics), proteomics (ChIP-seq) and metabolomics data (mass spectrometry, NMR). We also have expertise in programming, high-performance computing, cloud computing, and database development.




Jeffrey Good
Data Science Core Director

Eric Raile
Data Science Core Co-Director