Reputation: 23
I'm ultimately trying to use parallel as a simple job queue manager, a la here. The idea seems to be to put the commands in a file, have tail read the file (using -f option so that it keeps looking for new lines), then pipe the output of tail into parallel. So I try
true > jobqueue; tail -n+0 -f jobqueue | parallel
echo echo {} ::: a b c >> jobqueue
but nothing happens. OK... to test things, I then just try
cat jobqueue | parallel
which gives
{} ::: a b c
Meanwhile
parallel echo {} ::: a b c
correctly outputs
a
b
c
So why does parallel ignore the parallel-ish syntax when it was fed from a file, but runs fine when it's given the command directly?
FWIW this is version 20160722, and since I don't have root access on the machine I had to build from source and install into my home directory.
Upvotes: 2
Views: 955
Reputation: 23
I have managed to piece together a solution that (sufficiently) works in my situation, which is an elaboration of @John Bollinger's answer.
The point was that I wanted to pipe in both commands and arguments to parallel
, and those arguments would be over some range. For example, running command1 x
where x
ranges over 1..100
. Now parallel
has an efficient built-in syntax for this, namely
parallel command1 ::: {1..100}
If that's all I wanted to do then that would be fine. But I also have command2
and command3
and so on, each taking ranges of arguments, and I'd like to feed all of these into parallel
to manage. And I'd like to be able to add more commands after I've already started running parallel
.
Using tail
to continuously read some file jobqueue
as I proposed in the question works as a job queue. To have tail
and parallel
exit upon some special string endoffile
, I use
tail -f jobqueue | while read LINE
do
echo $LINE
[[ $LINE == *endoffile* ]] && pkill -P $$ tail
done | parallel -u -E endoffile
Then commands can be fed into jobqueue
, but the :::
syntax no longer works. Things do work if we do it the old-fashioned way:
for i in {1..100}
do
echo command1 $i >> jobqueue
done
And then in the end things can be terminated with echo endoffile >> jobqueue
.
Upvotes: 0
Reputation: 33685
From man parallel
:
There is a a small issue when using GNU parallel as queue system/batch manager: You have to submit JobSlot number of jobs before they will start, and after that you can submit one at a time, and job will start immediately if free slots are available. Output from the running or completed jobs are held back and will only be printed when JobSlots more jobs has been started (unless you use --ungroup or -u, in which case the output from the jobs are printed immediately). E.g. if you have 10 jobslots then the output from the first completed job will only be printed when job 11 has started, and the output of second completed job will only be printed when job 12 has started.
Upvotes: 2
Reputation: 180151
So why does parallel ignore the parallel-ish syntax when it was fed from a file, but runs fine when it's given the command directly?
Because that's what it is specified to do. What you're characterizing as "syntax" is defined in the manual as various command-line arguments and parts thereof. These seem mostly targeted at the case where the the command to parallelize is given on parallel
's command line, and the program input consists of data to operate upon. This is the mode of operation of the xargs
program, which was one of the inspirations for parallel
.
The fact is, you're making things more complicated than they need to be. When you run parallel
without specifying a command on its command line, the commands you feed it via its input don't need the kind of input-line manipulation operations that parallel
itself offers, and they can't, in general, take arguments any other way than on their own command line. When you run parallel
in that mode, you just feed it the exact commands you want it to run:
true > jobqueue; tail -n+0 -f jobqueue | parallel
echo echo a b c >> jobqueue
or
true > jobqueue; tail -n+0 -f jobqueue | parallel
echo echo a >> jobqueue
echo echo b >> jobqueue
echo echo c >> jobqueue
, depending on what exactly you're after.
As for nothing seeming to happen when you use tail -f
to feed input to parallel
, I'm inclined to think that parallel
is waiting for more input. Its first read(s) does not return enough data to trigger it to dispatch any jobs, but the standard input is still open, so it has reason to think that more input will be coming (which indeed is appropriate). If you continue to feed it jobs then it will soon get enough input to start running them. When you're ready to shut down the queue you must kill
the tail
command so that parallel
will know that it has reached the end of its input.
Upvotes: 2