Mike
Mike

Reputation: 5

Compare two columns from two different files and print non-matching and new separately

I would like to compare two columns from two multicolumn TABBed files (can be CSVed, if needed) (4th and 8th columns) and print non-matching and new separately

table 1 (showing 4th, 8th columns)
100001 28/10/2020
100002 15/03/2017

table 2 (showing 4th, 8th columns)
100001 28/10/2020
100002 18/03/2017
100003 07/12/2018

print non-matching // since data in 8th column is different (date, three days later)

100002 18/03/2017

print new // sice the record does not exist in table 1

100003 07/12/2018

I've tried different approaches from stackoverflow, reviewed similar questions and googled. This stackoverflow thread is the closest I have found but still no success

This one-liner

awk '{k=$4 FS $8} NR==FNR{a[k]; next} !(k in a)' file1 file2

will output new !(k in a), but not non-matching. However, 7 lines should be printed as new instead of 5. I wonder whether the date slashes / in the 8th column may affect. And still, it does not help with non-matching.

Edit: updated as per @ed-morton advice:

user@machine:~/testing$ cat file1
abc 123 456 100001  def ghi 789 28/10/2020
ujl 764 678 100002  htg dcf 065 15/03/2017
user@machine:~/testing$ cat file2
abc 123 456 100001  def ghi 789 28/10/2020
ujl 764 678 100002  htg dcf 065 18/03/2017
rfg 453 287 100003  grf cfe 764 07/12/2018
user@machine:~/testing$ awk 'NR==FNR{a[$4]=$8; next} a[$4] != $8' file1 file2
ujl 764 678 100002  htg dcf 065 18/03/2017
rfg 453 287 100003  grf cfe 764 07/12/2018

Upvotes: 0

Views: 557

Answers (1)

Ed Morton
Ed Morton

Reputation: 203209

This might be what you want (untested):

awk 'NR==FNR{a[$4]=$8; next} a[$4] != $8' file1 file2

Slashes in the 8th column or anywhere else or any other specific characters in the input make no difference, the code you started with and this code are just doing literal string comparisons.

Upvotes: 2

Related Questions