XNor
XNor

Reputation: 658

Filter lines by number of fields

I am filtering very long text files in Linux (usually > 1GB) to get only those lines I am interested in. I use with this command:

cat ./my/file.txt | LC_ALL=C fgrep -f ./my/patterns.txt | $decoder > ./path/to/result.txt

$decoder is the path to a program I was given to decode these files. The problem now is that it only accept lines with 7 fields, this is, 7 strings separated by spaces (e.g. "11 22 33 44 55 66 77"). Whenever a string with more or less fields is passed into this program makes it crash, and I get a broken pipe error message.

To fix it, I wrote a super simple script in Bash:

while read line ; do
    if [[ $( echo $line | awk '{ print NF }') == 7 ]]; then
        echo $line;
    fi;
done

But the problem is that now it take ages to finish. Before it took seconds and now it takes ~30 minutes.

Does anyone know a better/faster way to do this? Thank you in advance.

Upvotes: 2

Views: 336

Answers (1)

konsolebox
konsolebox

Reputation: 75568

Well perhaps you can insert awk between instead. No need to rely on Bash:

LC_ALL=C fgrep -f ./my/patterns.txt ./my/file.txt | awk 'NF == 7' | "$decoder" > ./path/to/result.txt

Perhaps awk can be the starter. Performance may be better that way:

awk 'NF == 7' ./my/file.txt | LC_ALL=C fgrep -f ./my/patterns.txt | "$decoder" > ./path/to/result.txt

You can merge fgrep and awk as a single awk command however I'm not sure if that would affect anything that require LC_ALL=C and that it would give better performance.

Upvotes: 2

Related Questions