Reputation: 1117
If I have a bash script (chunks.sh) that execute several mini scripts in parallel, I was wondering how to properly execute chunks.sh so that it runs in parallel for many folders? I have about 1000 folders with files that need to be processed. Here is my script:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --time=16:00:00
#SBATCH --output=mpi_output_%j.txt
#SBATCH --mail-type=FAIL
cd $SLURM_SUBMIT_DIR
module load gcc
module load gnu-parallel
module load bwa
module load samtools
parallel -j 10 < ../1convertfiles.sh
parallel -j 10 < ../2sortfiles.sh
parallel -j 10 < ../3indexfiles.sh
parallel -j 10 < ../4converttopile.sh
parallel -j 10 < ../5createconsensus.sh
parallel -j 10 < ../6concatenateconsensus.sh
Each folder has a name such as THAKID0001_dir, THAKID0010_dir, etc. So I was wondering how to properly apply a command in this script to loop through my current directory, find all the directories with *_dir attached, then execute all these mini scripts within that directory?
I tried putting my parallel commands into for loops but it was rerunning the mini scripts so many times. I think I can use:
parallel -j 10 < 1convertfiles.sh ::: *_dir/*
parallel -j 10 < 2sortfiles.sh ::: *_dir/*
etc.
But this logic to me seems that each parallel command block will not be running for the SAME directory at once. Each parallel line will be finding it's own directory to work in and these mini scripts have to run in order, hence why I tried writing a for loop but it was creating a huge mess.
Expected Results:
$ ./chunks.sh
### Should run the list of commands per folder ###
### For example, it will execute all the parallel commands in THAK0001_dir then it will execute all the parallel commands in THAK0002_dir, etc ####
TL;DR: How to make chunk.sh execute these parallel commandblocks for all directories with a certain tag (i.e. THAK*_dir) but each line should run once the previous line completed. Hope this made sense..thank you!
Upvotes: 2
Views: 1338
Reputation: 14452
On surface, the problem require helper script that will perform the sequential processing:
process-dir.sh in $SLURM_SUBMIT_DIR
#! /bin/bash
# Process all jobs for current folder, sequentially.
# Input: Folder, e.g. THAKID0001_dir
cd $1
../1convertfiles.sh
../2sortfiles.sh
../3indexfiles.sh
../4converttopile.sh
../5createconsensus.sh
../6concatenateconsensus.sh
And then run in parallel
#! /bin/bash
cd $SLURM_SUBMIT_DIR
module load gcc
module load gnu-parallel
module load bwa
module load samtools
parallel -j10 process-dir.sh ::: *_dir
Or avoid the file process-dir.sh
by including a bash function directly:
#! /bin/bash
cd $SLURM_SUBMIT_DIR
module load gcc
module load gnu-parallel
module load bwa
module load samtools
process-dir() {
# Process all jobs for current folder, sequentially.
# Input: Folder, e.g. THAKID0001_dir
cd "$1"
../1convertfiles.sh
../2sortfiles.sh
../3indexfiles.sh
../4converttopile.sh
../5createconsensus.sh
../6concatenateconsensus.sh
}
export -f process-dir
parallel -j10 process-dir ::: *_dir
Upvotes: 1