KarthicdotK
KarthicdotK

Reputation: 53

grep reverse with exact matching

I have a list file, which has id and number and am trying to get those lines from a master file which do not have those ids.

List file

nw_66 17296
nw_67 21414
nw_68 21372
nw_69 27387
nw_70 15830
nw_71 32348
nw_72 21925
nw_73 20363

master file

nw_1 5896
nw_2 52814
nw_3 14537
nw_4 87323
nw_5 56466
......
......
nw_n xxxxx

so far am trying this but not working as expected.

for i in $(awk '{print $1}' list.txt); do grep -v -w $i master.txt; done;

Kindly help

Upvotes: 1

Views: 275

Answers (3)

kvantour
kvantour

Reputation: 26591

The OP attempted the following line:

for i in $(awk '{print $1}' list.txt); do grep -v -w $i master.txt; done;

This line will not work as for every entry $i, you print all entries in master.txt tat are not equivalent to "$i". As a consequence, you will end up with multiple copies of master.txt, each missing a single line.

Example:

$ for i in 1 2; do grep -v -w "$i" <(seq 1 3); done
2     \ copy of seq 1 3 without entry 1
3     /
1     \ copy of seq 1 3 without entry 2
3     /

Furthermore, the attempt reads the file master.txt multiple times. This is very inefficient.

The unix tool grep allows one the check multiple expressions stored in a file in a single go. This is done using the -f flag. Normally this looks like:

$ grep -f list.txt master.txt

The OP can use this now in the following way:

$ grep -vwf <(awk '{print $1}' list.txt) master.txt

But this would do matches over the full line.

The awk solution presented by Kent is more flexible and allows the OP to define a more tuned match:

awk 'NR==FNR{a[$1]=1;next}!a[$1]' list master

Here the OP clearly states, I want to match column 1 of list with column 1 of master and I don't care about spaces or whatever is in column 2. The grep solution could still match entries in column 2.

Upvotes: 0

Kent
Kent

Reputation: 195289

Give this awk one-liner a try:

awk 'NR==FNR{a[$1]=1;next}!a[$1]' list master

Upvotes: 1

Mihir Luthra
Mihir Luthra

Reputation: 6779

Maybe this helps:

awk 'NR == FNR {id[$1]=1;next}
{
    if (id[$1] == "") {
        print $0
    }
}' listfile masterfile

We accept 2 files as input above, first one is listfile, second is masterfile.

NR == FNR would be true while awk is going through listfile. In the associative array id[], all ids in listfile are made a key with value as 1.

When awk goes through masterfile, it only prints a line if $1 i.e. the id is not a key in array ids.

Upvotes: 0

Related Questions