
Reputation: 258

How to process a list of files with SLURM

I'm new to SLURM. I want to process a list of files assembled_reads/*.sorted.bam in parallel. With the code below, however only one process is being used over and over again.

#SBATCH --job-name=****
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=24
#SBATCH --partition=short
#SBATCH --time=12:00:00
#SBATCH --array=1-100
#SBATCH --mem-per-cpu=16000
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=****@***.edu
srun hostname

for FILE in assembled_reads/*.sorted.bam; do
  echo ${FILE}
  OUTFILE=$(basename ${FILE} .sorted.bam).raw.snps.indels.g.vcf
  PLDY=$(awk -F "," '$1=="$FILE"{print $4}' metadata.csv)
  PLDYNUM=$( [[$PLDY = "haploid" ]] && echo "1" || echo "2")

  srun java"tmp" -jar GenomeAnalysisTK.jar \
  -R scaffs_HAPSgracilaria92_50REF.fasta \
  -T HaplotypeCaller \
  --emitRefConfidence GVCF \
  -ploidy $PLDYNUM \
  -nt 1 \
  -nct 24 \
  sleep 1 # pause to be kind to the scheduler

Upvotes: 4

Views: 5262

Answers (1)


Reputation: 59200

You are creating a job array but are not using it. You should replace the for-loop with an indexing of the files based on the slurm job array id:

#SBATCH --job-name=****
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=24
#SBATCH --partition=short
#SBATCH --time=12:00:00
#SBATCH --array=0-99
#SBATCH --mem-per-cpu=16000
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=****@***.edu
srun hostname

echo ${FILE}
OUTFILE=$(basename ${FILE} .sorted.bam).raw.snps.indels.g.vcf
PLDY=$(awk -F "," '$1=="$FILE"{print $4}' metadata.csv)
PLDYNUM=$( [[$PLDY = "haploid" ]] && echo "1" || echo "2")

srun java"tmp" -jar GenomeAnalysisTK.jar \
  -R scaffs_HAPSgracilaria92_50REF.fasta \
  -T HaplotypeCaller \
  --emitRefConfidence GVCF \
  -ploidy $PLDYNUM \
  -nt 1 \
  -nct 24 \

Just make sure to adapt the value of --array to be equal to the number of files to process.

Upvotes: 6

Related Questions