Reputation: 5
I would like to compare two columns from two multicolumn TABBed files (can be CSVed, if needed) (4th and 8th columns) and print non-matching and new separately
table 1 (showing 4th, 8th columns)
100001 28/10/2020
100002 15/03/2017
table 2 (showing 4th, 8th columns)
100001 28/10/2020
100002 18/03/2017
100003 07/12/2018
print non-matching // since data in 8th column is different (date, three days later)
100002 18/03/2017
print new // sice the record does not exist in table 1
100003 07/12/2018
I've tried different approaches from stackoverflow, reviewed similar questions and googled. This stackoverflow thread is the closest I have found but still no success
This one-liner
awk '{k=$4 FS $8} NR==FNR{a[k]; next} !(k in a)' file1 file2
will output new !(k in a), but not non-matching. However, 7 lines should be printed as new instead of 5. I wonder whether the date slashes / in the 8th column may affect. And still, it does not help with non-matching.
Edit: updated as per @ed-morton advice:
user@machine:~/testing$ cat file1
abc 123 456 100001 def ghi 789 28/10/2020
ujl 764 678 100002 htg dcf 065 15/03/2017
user@machine:~/testing$ cat file2
abc 123 456 100001 def ghi 789 28/10/2020
ujl 764 678 100002 htg dcf 065 18/03/2017
rfg 453 287 100003 grf cfe 764 07/12/2018
user@machine:~/testing$ awk 'NR==FNR{a[$4]=$8; next} a[$4] != $8' file1 file2
ujl 764 678 100002 htg dcf 065 18/03/2017
rfg 453 287 100003 grf cfe 764 07/12/2018
Upvotes: 0
Views: 557
Reputation: 203209
This might be what you want (untested):
awk 'NR==FNR{a[$4]=$8; next} a[$4] != $8' file1 file2
Slashes in the 8th column or anywhere else or any other specific characters in the input make no difference, the code you started with and this code are just doing literal string comparisons.
Upvotes: 2