Analysis Stages

Overview
Stage 1: HPC Pipelines
Stage 2: Visualization Apps

Overview

The MDI manages data analysis flow in two stages with different handling, as depicted below.

The critical distinctions between Stage 1 Pipelines and Stage 2 Apps are enumerated below. Users and developers may be interested in working with either or both stages depending on their needs.

Stage 1: HPC Pipelines

We use ‘pipeline’ synonymously with ‘workflow’ to refer to a series of analysis actions coordinated by scripts. Stage 1 pipelines are generally:

sample autonomous, i.e., they are executed “per sample”
executed once on an input data set, not iteratively
executed the same way on every sample, regardless of the experiment
hands-off, i.e., not interactive
resource intensive, in storage and/or CPU needs
executed on a high-performance computing (HPC) cluster
dependent on large input data files
capable of producing smaller output data files

The above properties make Stage 1 pipelines well suited to being run by a core facility or data producer according to agreed upon best practices. They are also ideal for a cluster server, which the ‘mdi’ command line utility helps manage.

Examples of Stage 1 pipeline actions are bulk image processing, training of machine learning algorithms, and read alignment to a genome.

The pipelines framework does not encode data analysis pipelines themselves, which are found in other code repositories called ‘tool suites’. Instead, the framework encodes script utilities that:

allow YAML configuration files to be used to define a pipeline
wrap pipelines in a common command-line interface (CLI)
coordinate pipeline job submission to HPC schedulers

Stage 2: Visualization Apps

Once data are processed by pipelines, Stage 2 applications, or “apps”, support interactive, graphical data visualization characterized by:

lower resource needs
execution by end users via a web interface
iterative execution, with adjustments for hypothesis testing, etc.
execution per sample set, project, etc. (i.e., multiple samples)
a need for the user’s detailed knowledge of the project
common analytical approaches applied to variable study designs
smaller processed data files as input
publication-ready images and other files as output

Examples of Stage 2 apps are R Shiny web tools that make interactive graphs and tables. Once again, the MDI framework does not carry the tools themselves, it provides a common web interface where all MDI apps can be easily loaded. Developers provide apps within their tool suites.

Analysis Stages

Table of Contents

Overview

Stage 1: HPC Pipelines

Stage 2: Visualization Apps