Reputation: 12230
FileA contains words and FileB contains strings.
How to, using sed/grep/awk (preferably), remove lines from FileB containing words found in FileA ?
Sample FileA:
Word asdf
Word qwer
Word zxcv
Sample FileB:
https://www.webaddress.com/point?a=asdf
http://www.webaddress.com/point?a=pert
https://www.webaddress.com/point?a=njil
http://www.webaddress.com/point?a=qwer
http://www.webaddress.com/point?a=zxcv
So, FileB should be changed to:
http://www.webaddress.com/point?a=pert
https://www.webaddress.com/point?a=njil
Speed is an issue here as both FileA and FileB can be huge. FileA and FileB can be sorted etc. if required.
Upvotes: 1
Views: 98
Reputation: 123458
You could use grep
:
grep -v -f <(awk '{print $2}' FileA) FileB > tmp && mv tmp FileB
As commented by Glenn Jackman, you could also use the -F
option for grep
that would make it treat the pattern as fixed strings and would be more efficient.
The <( )
syntax is referred to as process substitution and produces a file containing the list of words, i.e. removes word
from fileA
.
The -f
option for grep
takes patterns from a file. The -v
option inverts matches. So you get the lines in fileB
that do not contain any word in the second column of fileA
.
For your input, it'd produce:
http://www.webaddress.com/point?a=pert
https://www.webaddress.com/point?a=njil
Upvotes: 2
Reputation: 41446
Here is an awk
solution:
awk 'FNR==NR{a[$2]++;next} {for (i in a) if ($0~i) next}8' fileA fileB
http://www.webaddress.com/point?a=pert
https://www.webaddress.com/point?a=njil
Upvotes: 0
Reputation: 780724
grep -F -v -f <(sed 's/^Word //' FileA) FileB > FileB.new
-F
means to match fixed strings rather than regular expressions.-v
means to output the lines that don't match-f
means to take the list of strings to match from a filename<(command line)
synthesizes a filename for the output of the command linesed
command removes the Word
prefix from all the lines of FileA
.Upvotes: 3