molika sinha
molika sinha

Reputation: 13

Comparing two files in AWK

I have two .txt files and I want to check if the contents of one file are present in the other or not. My Book1.txt contents are:

PATX248
PATX216
PATX203
PATX219B
PATX212
PATX248
PATX211
PATX190
PATX222
PATX241
B8025
B1003
B8063
B8032
C0999
C1035
B1011

My InventorySheet2finaloutput.txt is:

B8061P3 366-L4/26/2017 1
PATX-148 P3 4
 1003P4 M#1N-L1/19/2017
B1011P5 330-L2/23/2017 1
B8032P3 336-L3/10/2017 1
B1011P5 329-L2/14/2017 1
PATX-60P5 279-L2/8/2017 1
PATX-70 P3 5
B1573P6 1R-R8/10/2017 1
B8025 P4 5
B8025 P5 1
 1061P3 372-R4/26/2017
 2078 P4M#1RR-R8/25/2017
C0999 P5 4
B8078 P4M#1N-R8/25/2017 2
C-1008 P4 1
PATX-55 P4 4
B1003P5 325-R3/3/2017 1
PATX-45P4 266-L2/14/2017 1
B8032P4 384-R4/26/2017 1
C-1035 P3 1
B8032P3 340-R3/17/2017 1

Output:

B1003P5 325-R3/3/2017 1
B8032P3 336-L3/10/2017 1
B8032P4 384-R4/26/2017 1
B8032P3 340-R3/17/2017 1
C0999 P5 4
C-1035 P3 1
B1011P5 330-L2/23/2017 1
B1011P5 329-L2/14/2017 1

I have used all the solutions I could search on google, they all are getting executed but no result is being printed. The solutions that I tried are:

  1. grep -v -F -x -f Book1.txt InventorySheet2finaloutput.txt (tried grep all forms of flag)
  2. awk 'NR == FNR {Book1[$0]++; next} ($0 in Book1)' Book1.txt InventorySheet2finaloutput.txt
  3. awk 'NR==FNR{a[$1];next}$1 in a{print $1}' Book1.txt InventorySheet2finaloutput.txt
  4. grep "$(cat Book1.txt)" InventorySheet2finaloutput.txt

I want to find if the contents of Book1 are present in InventorySheet or not.

Upvotes: 1

Views: 94

Answers (2)

Ed Morton
Ed Morton

Reputation: 203209

Best I can tell this does what you say want and the posted expected output in your question is wrong:

$ cat tst.awk
{
    key=$1
    gsub(/[^[:alnum:]]/,"",key)
    match(key,/^[[:upper:]]+[[:digit:]]+/)
    key = substr(key,RSTART,RLENGTH)
}
NR==FNR { keys[key]; next }
key in keys

$ awk -f tst.awk Book1.txt Inventory.txt
B1011P5 330-L2/23/2017 1
B8032P3 336-L3/10/2017 1
B1011P5 329-L2/14/2017 1
B8025 P4 5
B8025 P5 1
C0999 P5 4
B1003P5 325-R3/3/2017 1
B8032P4 384-R4/26/2017 1
C-1035 P3 1
B8032P3 340-R3/17/2017 1

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 246754

Oh, I get it now: the contents of Book1 are supposed to be the prefix (with, it seems, an optional hyphen) of the lines of InventorySheet. So, given B1003 in Book1 we match the B1003P5 line in InventorySheet. Or C1035 matches C-1035.

grep -Ef <(sed -E 's/^/^/; s/([[:alpha:]])([[:digit:]])/\1-?\2/' Book1) InventorySheet

That uses sed to generate the extended regular expressions from the Book1 file, and the process substitution allows up to hand grep a "pseudo-filename".

Given your sample files, this outputs

B1011P5 330-L2/23/2017 1
B8032P3 336-L3/10/2017 1
B1011P5 329-L2/14/2017 1
B8025 P4 5
B8025 P5 1
C0999 P5 4
B1003P5 325-R3/3/2017 1
B8032P4 384-R4/26/2017 1
C-1035 P3 1
B8032P3 340-R3/17/2017 1

In awk, this would be

awk '
    NR==FNR {book[$1]; next}
    { 
        key=$1
        gsub(/-/, "", key)
        for (b in book) 
            if (key ~ "^"b) {print; break}
    }
' Book1 InventorySheet

Upvotes: 1

Related Questions