Aarthi
Aarthi

Reputation: 13

Compare two delimited files field by field and find the missing and non matching records

Two input files each having 3 fields. The first two fields in both the files have to matched and the third field has to be compared.

    File1
A ; 1 ; a1
B ; 2 ; b2
C ; 3 ; c3
A ; 4 ; a4


 File 2
B ; 2 ; b2
C ; 3 ; c5
E ; 5 ; e5

I want output like below.

Mismatching: 
C ; 3 ; c3

Lines missing in file1:
E ; 5 ; e5

Lines missing in file2: 
A ; 1 ; a1
A ; 4 ; a4

I also want the records missing in file1 and file2.

I tried

awk 'BEGIN {FS = ";"} NR==FNR{a[$1,$2] = $3; next} (a[$1,$2] != $3)' file1 file2

but this is giving me only the rows in file2 which are not present in file1..

Upvotes: 1

Views: 125

Answers (1)

Sundeep
Sundeep

Reputation: 23707

$ awk -F';' '
   NR==FNR{a[$1","$2]=$0; next}

   $1","$2 in a{if(a[$1","$2] != $0)mm=mm $0 RS; delete a[$1","$2]; next}
   {nf=nf $0 RS}

   END{print "Mismatching:\n" mm;
       print "Lines missing in file1:"; for(i in a)print a[i];
       print "\nLines missing in file2:\n" nf}
   ' file2 file1
Mismatching:
C ; 3 ; c3

Lines missing in file1:
E ; 5 ; e5

Lines missing in file2:
A ; 1 ; a1
A ; 4 ; a4
  • $1","$2 in a if first two fields are found in a
    • if value in a doesn't match current line, append the line to variable mm (mismatch lines)
    • delete the key from a so that at the end whichever keys were not called upon will give missing lines
  • nf=nf $0 RS if the key wasn't found in a then we get lines not found in first file argument passed to awk
  • END{...} print as required


Better to save code in a file and call it using -f

$ cat cmp.awk 
NR==FNR{a[$1","$2]=$0; next}

$1","$2 in a{if(a[$1","$2] != $0)mm=mm $0 RS; delete a[$1","$2]; next}
{nf=nf $0 RS}

END{print "Mismatching:\n" mm;
    print "Lines missing in file1:"; for(i in a)print a[i];
    print "\nLines missing in file2:\n" nf}

$ awk -F';' -f cmp.awk file2 file1

Upvotes: 0

Related Questions