Reputation: 317
This following script is used for running parallel subprocess in bash,which is slightly changed from Running a limited number of child processes in parallel in bash?
#!/bin/bash
set -o monitor # means: run background processes in a separate processes...
N=1000
todo_array=($(seq 0 $((N-1))))
max_jobs=5
trap add_next_job CHLD
index=0
function add_next_job {
if [[ $index -lt ${#todo_array[@]} ]]
then
do_job $index &
index=$(($index+1))
fi
}
function do_job {
echo $1 start
time=$(echo "scale=0;x=$RANDOM % 10;scale=5;x/20+0.05" |bc);sleep $time;echo $time
echo $1 done
}
while [[ $index -lt $max_jobs ]] && [[ $index -lt ${#todo_array[@]} ]]
do
add_next_job
done
wait
The job is choosing a random number in 0.05:0.05:5.00 and sleep that much second.
For example, with N=10, a sample out put is
1 start
4 start
3 start
2 start
0 start
.25000
2 done
5 start
.30000
3 done
6 start
.35000
0 done
7 start
.40000
1 done
8 start
.40000
4 done
9 start
.05000
7 done
.20000
5 done
.25000
9 done
.45000
6 done
.50000
8 done
which has 30 lines in total.
But for big N such as 1000,the result can be strange.One run gives 2996 lines of ouput,with 998 lines with start ,999 with done ,and 999 with float number.644 and 652 is missing in start,644 is missing in done.
These test are runned on an Arch Linux with bash 4.2.10(2).Similar results can be produced on debian stable with bash 4.1.5(1).
EDIT:I tried parallel in moreutils and GNU parallel for this test.Parallel in moreutils has the same problem.But GNU parallel works perfect.
Upvotes: 3
Views: 892
Reputation: 1614
The only thing I can imagine is that you're running out of resources; check "ulimit -a" and look for "max user processes". If that's less then the number of processes you want to spawn, you will end up with errors.
Try to set the limits for your user (if you're not running as root) to a higher limit. On Redhatish systems you can do this by:
Adding that line to /etc/pam.d/login:
session required pam_limits.so
Adding the following content to /etc/security/limits.conf:
myuser soft nproc 1000
myuser hard nproc 1024
where "myuser" is the username who is granted the right, 1000 the default value of "max user processes" and 1024 the maximum number of userprocesses. Soft- and hard-limit shouldn't be too much apart. It only says what the user is allowed to set himself using the "ulimit" command in his shell.
So the myuser will start with a total of a 1000 processes (including the shell, all other spawned processes), but may raise it to 1024 using ulimit:
$ ulimit -u
1000
$ ulimit -u 1024
$ ulimit -u
1024
$ ulimit -u 2000
-bash: ulimit: max user processes: cannot modify limit: Operation not permitted
A reboot is not required, it works instantly.
Good luck! Alex.
Upvotes: 1
Reputation: 93710
I think this is just due to all of the subprocesses inheriting the same file descriptor and trying to append to it in parallel. Very rarely two of the processes race and both start appending at the same location and one overwrites the other. This is essentially the reverse of what one of the comments suggests.
You could easily check this by redirecting through a pipe, such as with your_script | tee file
because pipes have rules about atomicity of data delivered by single write()
calls that are smaller than a particular size.
There's another question on SO that's similar to this (I think it just involved two threads both quickly writing numbers) where this is also explained but I can't find it.
Upvotes: 2