myusuf
myusuf

Reputation: 12230

Delete line containing word from FileB if word found in FileA

FileA contains words and FileB contains strings.

How to, using sed/grep/awk (preferably), remove lines from FileB containing words found in FileA ?

Sample FileA:

Word asdf
Word qwer
Word zxcv

Sample FileB:

https://www.webaddress.com/point?a=asdf
http://www.webaddress.com/point?a=pert
https://www.webaddress.com/point?a=njil
http://www.webaddress.com/point?a=qwer
http://www.webaddress.com/point?a=zxcv

So, FileB should be changed to:

http://www.webaddress.com/point?a=pert
https://www.webaddress.com/point?a=njil

Speed is an issue here as both FileA and FileB can be huge. FileA and FileB can be sorted etc. if required.

Upvotes: 1

Views: 98

Answers (3)

devnull
devnull

Reputation: 123458

You could use grep:

grep -v -f <(awk '{print $2}' FileA) FileB > tmp && mv tmp FileB

As commented by Glenn Jackman, you could also use the -F option for grep that would make it treat the pattern as fixed strings and would be more efficient.

The <( ) syntax is referred to as process substitution and produces a file containing the list of words, i.e. removes word from fileA.

The -f option for grep takes patterns from a file. The -v option inverts matches. So you get the lines in fileB that do not contain any word in the second column of fileA.

For your input, it'd produce:

http://www.webaddress.com/point?a=pert
https://www.webaddress.com/point?a=njil

Upvotes: 2

Jotne
Jotne

Reputation: 41446

Here is an awk solution:

awk 'FNR==NR{a[$2]++;next} {for (i in a) if ($0~i) next}8' fileA fileB
http://www.webaddress.com/point?a=pert
https://www.webaddress.com/point?a=njil

Upvotes: 0

Barmar
Barmar

Reputation: 780724

grep -F -v -f <(sed 's/^Word //' FileA) FileB > FileB.new
  • -F means to match fixed strings rather than regular expressions.
  • -v means to output the lines that don't match
  • -f means to take the list of strings to match from a filename
  • <(command line) synthesizes a filename for the output of the command line
  • The sed command removes the Word prefix from all the lines of FileA.

Upvotes: 3

Related Questions