Reputation: 604
I have a text file (we'll call it keywords.txt
) that contains a number of strings that are separated by newlines (though this isn't set in stone; I can separate them with spaces, commas or whatever is most appropriate). I also have a number of other text files (which I will collectively call input.txt
).
What I want to do is iterate through each line in input.txt
and test whether that line contains one of the keywords. After that, depending on what input file I'm working on at the time, I would need to either copy matching lines in input.txt
into output.txt
and ignore non-matching lines or copy non-matching lines and ignore matching.
I searched for a solution but, though I found ways to do parts of what I'm trying to do, I haven't found a way to do everything I'm asking for here. While I could try and combine the various solutions I found, my main concern is that I would end up wondering if what I coded would be the best way of doing this.
This is a snippet of what I currently have in keywords.txt
:
google
adword
chromebook.com
cobrasearch.com
feedburner.com
doubleclick
foofle.com
froogle.com
gmail
keyhole.com
madewithcode.com
Here is an example of what can be found in one of my input.txt
files:
&expandable_ad_
&forceadv=
&gerf=*&guro=
&gIncludeExternalAds=
&googleadword=
&img2_adv=
&jumpstartadformat=
&largead=
&maxads=
&pltype=adhost^
In this snippet, &googleadword=
is the only line that would match the filter and there are scenarios in my case where output.txt
will either have only the matching line inserted or every line that doesn't match the keywords.
Upvotes: 0
Views: 1131
Reputation: 1004
1. Assuming the content of keywords.txt
is separated by newlines:
google
adword
chromebook.com
...
The following will work:
# Use keywords.txt as your pattern & copy matching lines in input.txt to output.txt
grep -Ff keywords.txt input.txt > output.txt
# Use keywords.txt as your pattern & copy non-matching lines in input.txt to output.txt
grep -vFf keywords.txt input.txt > output.txt
2. Assuming the content of keywords.txt
is separated by vertical bars:
google|adword|chromebook.com|...
The following will work:
# Use keywords.txt as your pattern & copy matching lines in input.txt to output.txt
grep -Ef keywords.txt input.txt > output.txt
# Use keywords.txt as your pattern & copy non-matching lines in input.txt to output.txt
grep -vEf keywords.txt input.txt > output.txt
3. Assuming the content of keywords.txt
is separated by commas:
google,adword,chromebook.com,...
There are many ways of achieving the same, but a simple way would be to use tr
to replace all commas with vertical bars and then interpret the pattern with grep's extended regular expression.
# Use keywords.txt as your pattern & copy matching lines in input.txt to output.txt
grep -E $(tr ',' '|' < keywords.txt) input.txt > output.txt
# Use keywords.txt as your pattern & copy non-matching lines in input.txt to output.txt
grep -vE $(tr ',' '|' < keywords.txt) input.txt > output.txt
Grep Options
-v, --invert-match Selected lines are those not matching any of the specified patterns. -F, --fixed-strings Interpret each data-matching pattern as a list of fixed strings, separated by newlines, instead of as a regular expression. -E, --extended-regexp Interpret pattern as an extended regular expression (i.e. force grep to behave as egrep). -f file, --file=file Read one or more newline separated patterns from file. Empty pattern lines match every input line. Newlines are not considered part of a pattern. If file is empty, nothing is matched.
Read more about grep
Read more about tr
Upvotes: 1