Biblot
Biblot

Reputation: 705

How to properly run find | parallel with grep + escape characters?

I have approximately 1500 2GB files in a folder and would like to extract lines from them based on a regex. I tried:

find . -regex "filename pattern" -exec grep -P "pattern1\t|pattern2\t|pattern3\t|...|patternN\t" {} +

which works perfectly, but is pretty slow. I then read about running grep with GNU parallel, but couldn't figure out how to properly use it. Here's what I tried:

find . -regex "filename pattern" | parallel grep -P "pattern1\t|pattern2\t|pattern3\t|...|patternN\t" {}

along with a few variations of this command. However, I get in return:

/bin/bash: pattern1t: command not found
/bin/bash: pattern3t: command not found
/bin/bash: pattern2t: command not found
...

It seems the problem lies with the \t I use to ensure I match an entire string in a column of a TSV file. The grep command without parallel works perfectly with this regex.

How can I use escape characters in the grep regex with parallel?

Upvotes: 2

Views: 583

Answers (1)

Biblot
Biblot

Reputation: 705

As @Mark Setchell pointed out, I missed the "--quote" argument! This solution works:

find . -regex "filename pattern" -print0 | parallel -0 --quote grep -P "pattern1\t|pattern2\t|pattern3\t|...|patternN\t"

Upvotes: 4

Related Questions