Link Search Menu Expand Document

Analysis Stages

The MDI manages data analysis flow in two stages depicted below. Users and developers may be interested in working with either or both stages.

Stage 1: HPC Pipelines

We use ‘pipeline’ synonymously with ‘workflow’ to refer to a series of
analysis actions coordinated by scripts. Stage 1 pipelines are typically:

  • sample autonomous, i.e., they are executed “per sample”
  • executed once on an input data set, not iteratively
  • executed the same way on every sample, regardless of the experiment
  • hands-off, i.e., not interactive
  • resource intensive, in storage and/or CPU needs
  • executed on a high-performance computing (HPC) cluster
  • dependent on large input data files
  • capable of producing smaller output data files

The above properties make Stage 1 pipelines well suited to being run by a core facility or data producer according to best practices. They are ideal for a cluster server, which the ‘mdi’ command line utility helps manage.

Examples of Stage 1 pipeline actions are bulk image processing, training of machine learning algorithms, and read alignment to a genome.

The MDI pipelines framework does not encode data analysis pipelines themselves, which are found in other code repositories called ‘tool suites’. Instead, the framework encodes script utilities that:

  • allow YAML configuration files to be used to define a pipeline
  • wrap pipelines in a common command-line interface (CLI)
  • coordinate pipeline job submission to HPC schedulers

Stage 2: Visualization Apps

Stage 2 applications, or “apps”, support interactive, graphical data visualization characterized by:

  • lower resource needs
  • execution by end users via a web interface
  • iterative execution, with adjustments for hypothesis testing, etc.
  • execution per sample set, project, etc. (i.e., multiple samples)
  • a need for the user’s detailed knowledge of the project
  • common analytical approaches applied to variable study designs
  • smaller processed data files as input
  • publication-ready images and other files as output

Examples of Stage 2 apps are R Shiny web tools that make interactive graphs and tables. Once again, the MDI framework does not carry the tools themselves, it provides a common web interface where MDI apps from different tool suites can be easily loaded and developed.