aeonblue
aeonblue

Reputation: 23

GNU parallel ignores piped commands

I'm ultimately trying to use parallel as a simple job queue manager, a la here. The idea seems to be to put the commands in a file, have tail read the file (using -f option so that it keeps looking for new lines), then pipe the output of tail into parallel. So I try

true > jobqueue; tail -n+0 -f jobqueue | parallel
echo echo {} ::: a b c >> jobqueue

but nothing happens. OK... to test things, I then just try

cat jobqueue | parallel

which gives

{} ::: a b c

Meanwhile

parallel echo {} ::: a b c

correctly outputs

a
b
c

So why does parallel ignore the parallel-ish syntax when it was fed from a file, but runs fine when it's given the command directly?

FWIW this is version 20160722, and since I don't have root access on the machine I had to build from source and install into my home directory.

Upvotes: 2

Views: 955

Answers (3)

aeonblue
aeonblue

Reputation: 23

I have managed to piece together a solution that (sufficiently) works in my situation, which is an elaboration of @John Bollinger's answer.

The point was that I wanted to pipe in both commands and arguments to parallel, and those arguments would be over some range. For example, running command1 x where x ranges over 1..100. Now parallel has an efficient built-in syntax for this, namely

parallel command1 ::: {1..100}

If that's all I wanted to do then that would be fine. But I also have command2 and command3 and so on, each taking ranges of arguments, and I'd like to feed all of these into parallel to manage. And I'd like to be able to add more commands after I've already started running parallel.

Using tail to continuously read some file jobqueue as I proposed in the question works as a job queue. To have tail and parallel exit upon some special string endoffile, I use

tail -f jobqueue | while read LINE
do
    echo $LINE
    [[ $LINE == *endoffile* ]] && pkill -P $$ tail
done | parallel -u -E endoffile

Then commands can be fed into jobqueue, but the ::: syntax no longer works. Things do work if we do it the old-fashioned way:

for i in {1..100}
do
    echo command1 $i >> jobqueue
done

And then in the end things can be terminated with echo endoffile >> jobqueue.

Upvotes: 0

Ole Tange
Ole Tange

Reputation: 33685

From man parallel:

There is a a small issue when using GNU parallel as queue system/batch manager: You have to submit JobSlot number of jobs before they will start, and after that you can submit one at a time, and job will start immediately if free slots are available. Output from the running or completed jobs are held back and will only be printed when JobSlots more jobs has been started (unless you use --ungroup or -u, in which case the output from the jobs are printed immediately). E.g. if you have 10 jobslots then the output from the first completed job will only be printed when job 11 has started, and the output of second completed job will only be printed when job 12 has started.

Upvotes: 2

John Bollinger
John Bollinger

Reputation: 180151

So why does parallel ignore the parallel-ish syntax when it was fed from a file, but runs fine when it's given the command directly?

Because that's what it is specified to do. What you're characterizing as "syntax" is defined in the manual as various command-line arguments and parts thereof. These seem mostly targeted at the case where the the command to parallelize is given on parallel's command line, and the program input consists of data to operate upon. This is the mode of operation of the xargs program, which was one of the inspirations for parallel.

The fact is, you're making things more complicated than they need to be. When you run parallel without specifying a command on its command line, the commands you feed it via its input don't need the kind of input-line manipulation operations that parallel itself offers, and they can't, in general, take arguments any other way than on their own command line. When you run parallel in that mode, you just feed it the exact commands you want it to run:

true > jobqueue; tail -n+0 -f jobqueue | parallel
echo echo a b c >> jobqueue

or

true > jobqueue; tail -n+0 -f jobqueue | parallel
echo echo a >> jobqueue
echo echo b >> jobqueue
echo echo c >> jobqueue

, depending on what exactly you're after.

As for nothing seeming to happen when you use tail -f to feed input to parallel, I'm inclined to think that parallel is waiting for more input. Its first read(s) does not return enough data to trigger it to dispatch any jobs, but the standard input is still open, so it has reason to think that more input will be coming (which indeed is appropriate). If you continue to feed it jobs then it will soon get enough input to start running them. When you're ready to shut down the queue you must kill the tail command so that parallel will know that it has reached the end of its input.

Upvotes: 2

Related Questions