Geremia
Geremia

Reputation: 5656

`-a` vs. `cat`ting in GNU Parallel

When trying to process, for example, 128 bytes of a file at a time, this works:

cat input.dat | parallel --pipe --recend '' -k --block-size 128 "<command>" > output.dat

But this

parallel -a input.dat --pipe --recend '' -k --block-size 128 "<command>" > output.dat

throws an error:

parallel: Warning: A NUL character in the input was replaced with \0.
parallel: Warning: NUL cannot be passed through in the argument list.
parallel: Warning: Did you mean to use the --null option? 

Why?

Upvotes: 1

Views: 23

Answers (2)

Ole Tange
Ole Tange

Reputation: 33740

--pipe always reads from STDIN. You cannot ask it to read from a file.

-a is an alias for ::::, which basically is equivalent to ::: $(cat file): This is always treated as an input source for arguments.

This is because you can combine --pipe with ::::

$ seq 1000000 | parallel --pipe 'echo {}; wc -l' ::: {a..z}
a
165668
b
149796
c
149796
d
149796
e
149796
f
149796
g
85352

Upvotes: 1

Jokoyoski
Jokoyoski

Reputation: 177

The difference comes from how parallel handles input:

  1. cat input.dat | parallel ...:
    Piping the content (data) direct will treat it as a stream, with --pipe splitting it into chunks without interpreting NUL characters.

  2. parallel -a input.dat ...:
    The -a option reads input.dat as a list of arguments, where NUL characters cause warnings unless --null is used.

To fix the the issue with -a, use:

parallel -a input.dat --pipe --recend '' -k --block-size 128 --null "<command>" > output.dat

Upvotes: 1

Related Questions