Action Options
Each action offered by a pipeline will have a rich set of available options. For example, here is the inline help for the svCapture align
action.
$ mdi svCapture align
svCapture: Characterize structural variant junctions in short-read, paired-end capture library(s)
align: clean paired-end reads and align to reference genome; output name-sorted bam/cram
fastq-input:
-i,--input-dir <string> expects <input-dir>/<input-name>/*.fastq.gz, <input-dir>/<input-name>_*.fastq.gz, or .sra *REQUIRED*
-I,--input-name <string> see --input-dir for details; defaults to --data-name if null [null]
<truncated for brevity>
output:
-O,--output-dir <string> the directory where output files will be placed; must already exist *REQUIRED*
-N,--data-name <string> simple name for the data (e.g., sample) being analyzed (no spaces or periods) *REQUIRED*
version:
-v,--version <string> the version to use of the tool suite that provides the requested pipeline [latest]
resources:
--account <string> name of the account used to run a job on the server [NA]
-m,--runtime <string> execution environment: one of direct, container, or auto (container if supported) [auto]
-p,--n-cpu <integer> number of CPUs used for parallel processing [1]
-r,--ram-per-cpu <string> RAM allocated per CPU (e.g., 500M, 4G) [4G]
-t,--tmp-dir <string> directory used for small temporary files (recommend SSD) [/tmp]
-T,--tmp-dir-large <string> directory used for large temporary files (generally >10GB) [/tmp]
job-manager:
--email <string> email address of the user submitting the job [nobody@nowhere.edu]
--account <string> name of the account used to run a job on the server [NA]
--time-limit <string> time limit for the running job (e.g., dd-hh:mm:ss for slurm --time) [10:00]
--partition <string> slurm --partition (standard, gpu, largemem, viz, standard-oc) [standard]
workflow:
-f,--force <boolean> execute certain actions and outcomes without prompting (create, rollback, etc.)
-R,--rollback <integer> revert to this pipeline step number before beginning at the next step (implies --force) [null]
-q,--quiet <boolean> suppress the configuration feedback in the output log stream
help:
-h,--help <boolean> show pipeline help
-d,--dry-run <boolean> only show parsed variable values; do not execute the action
Action-specific options
Typically, a pipeline will expose a series of action-specific options. These are listed first and can be specified at the command line in short, e.g., -i
, or long, e.g., --input-dir
, format. Long format names are used in data.yml files.
For clarity of organization and ease of reuse, options are organized into families, leading to data.yml files with entries like:
# data.yml
align: # the pipeline action
fastq-input: # the option family
input-dir: /data/path # a single option value
Standard options
Other options are either mandated or offered by the MDI pipelines framework and automatically added to all pipeline actions. Two are critically important as they are how all MDI pipelines know where to write their output files:
- output-dir = the destination directory for an analysis project
- data-name = the sub-folder in
--output-dir
for each sample analyzed
Thus, the following job configuration file:
# data.yml
align:
output:
output-dir: /project/path
data-name:
- sample_1
- sample_2
would write output for two analyzed samples to folders:
- /project/path/sample_1
- /project/path/sample_2
By the MDI Code of Conduct, pipelines are only allowed to write output files to --output-dir
.
Among other useful common options, --dry-run
allows a test of the action and options configuration prior to actual execution. Every pipeline can also use parallel processing via options --n-cpu
and --ram-per-cpu
, if supported by the pipeline developer.