Use GNU parallel to execute a series of greps?

Question

I have the following string of greps

grep -E '[0-9]{3}\.[0-9]+ ms' file.log | grep -v "Cycle Based" | grep -Ev "[0-9]{14}\.[0-9]+ ms" > pruned.log

Which I need to run on a 10G log file. It's taking a bit longer than I am willing to wait so I am trying to use GNU parallel, but it's not clear to me how I can execute this chain of greps using parallel.

This is not a question of how to execute the fastest possible single grep, this is about how to execute a series of greps in parallel

Ole Tange · Accepted Answer

Usually the limiting factor when grepping a file is the disk. If you have a single disk, then odds are that this will be limiting you.

However, if you have RAID10/50/60 or a distributed network filesystem, then parallelizing may speed up your processing:

doit() {
    grep -E '[0-9]{3}\.[0-9]+ ms' | grep -v "Cycle Based" | grep -Ev "[0-9]{14}\.[0-9]+ ms"
}
export -f doit
parallel --pipepart -a file.log --block -1 -k doit > pruned.log

Use GNU parallel to execute a series of greps?

Answers (1)

Related Questions