user3150037
user3150037

Reputation: 147

grep -vf too slow with large files

I am trying filter data from data.txt using patterns stored in a file filter.txt. Like below,

grep -v -f filter.txt data.txt > op.txt

This grep takes more than 10-15 minutes for 30-40K lines in filter.txt and ~300K lines in data.txt.

Is there any way to speed up this?

data.txt

data1
data2
data3

filter.txt

data1

op.txt

data2
data3

This works with solution provided by codeforester but fails when filter.txt is empty.

Upvotes: 5

Views: 2090

Answers (1)

codeforester
codeforester

Reputation: 43109

Based on Inian's solution in the related post, this awk command should solve your issue:

awk 'FNR==NR {hash[$0]; next} !($0 in hash)' filter.txt data.txt > op.txt

Upvotes: 7

Related Questions