Reputation: 88
I'm trying to submit multiple jobs in parallel as a preprocessing step in sbatch using srun. The loop reads a file containing 40 file names and uses "srun command" on each file. However, not all files are being sent off with srun and the rest of the sbatch script continues after the ones that did get submitted finish. The real sbatch script is more complicated and I can't use arrays with this so that won't work. This part should be pretty straightforward though.
I made this simple test case as a sanity check and it does the same thing. For every file name in the file list (40) it creates a new file containing 'foo' in it. Every time I submit the script with sbatch it results in a different number of files being sent off with srun.
#!/bin/sh
#SBATCH --job-name=loop
#SBATCH --nodes=5
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:10:00
#SBATCH --mem-per-cpu=1G
#SBATCH -A zheng_lab
#SBATCH -p exacloud
#SBATCH --error=/home/exacloud/lustre1/zheng_lab/users/eggerj/Dissertation/splice_net_prototype/beatAML_data/splicing_quantification/test_build_parallel/log_files/test.%J.err
#SBATCH --output=/home/exacloud/lustre1/zheng_lab/users/eggerj/Dissertation/splice_net_prototype/beatAML_data/splicing_quantification/test_build_parallel/log_files/test.%J.out
DIR=/home/exacloud/lustre1/zheng_lab/users/eggerj/Dissertation/splice_net_prototype/beatAML_data/splicing_quantification/test_build_parallel
SAMPLES=$DIR/samples.txt
OUT_DIR=$DIR/test_out
FOO_FILE=$DIR/foo.txt
# Create output directory
srun -N 1 -n 1 -c 1 mkdir $OUT_DIR
# How many files to run
num_files=$(srun -N 1 -n 1 -c 1 wc -l $SAMPLES)
echo "Number of input files: " $num_files
# Create a new file for every file in listing (run 5 at a time, 1 for each node)
while read F ;
do
fn="$(rev <<< "$F" | cut -d'/' -f 1 | rev)" # Remove path for writing output to new directory
echo $fn
srun -N 1 -n 1 -c 1 cat $FOO_FILE > $OUT_DIR/$fn.out &
done <$SAMPLES
wait
# How many files actually got created
finished=$(srun -N 1 -n 1 -c 1 ls -lh $OUT_DIR/*out | wc -l)
echo "Number of files submitted: " $finished
Here is my output log file the last time I tried to run it:
Number of input files: 40 /home/exacloud/lustre1/zheng_lab/users/eggerj/Dissertation/splice_net_prototype/beatAML_data/splicing_quantification/test_build_parallel/samples.txt
sample1
sample2
sample3
sample4
sample5
sample6
sample7
sample8
Number of files submitted: 8
Upvotes: 0
Views: 3578
Reputation: 59300
The issue is that srun
redirects its stdin
to the tasks it starts, and therefore the contents of $SAMPLES
is consumed, in an unpredictable way, by all the cat
commands that are started.
Try with
srun --input none -N 1 -n 1 -c 1 cat $FOO_FILE > $OUT_DIR/$fn.out &
The --input none
parameter will tell srun
to not mess with stdin
.
Upvotes: 1