Reputation: 1996
Let's say I have a file DATA with 10,000,000 lines. I have another file IDS with 100,000 strings. I want to extract all lines from DATA that contain one of the strings from IDS. An additional condition is that there is a 1:1 relationship between the files, so every ID has one line of DATA and every DATA has one ID.
What is the most efficient, least complicated way to do this using standard linux command-line utilities?
My ideas so far:
Upvotes: 0
Views: 384
Reputation: 9466
grep -F -f IDS DATA
Don't miss -F
: it prevents from interpreting IDS as regular expressions, and enables a much more efficient Aho-Korasick algorithm.
Upvotes: 3
Reputation: 11090
If IDS contains the exact strings you need to find in DATA, one string per line, try using
grep --file=IDS DATA > results
Upvotes: 2