Reputation: 396
Requirement is to compare two huge Unix files and writing the difference in third file based on a unique key (first field) after searching few options got the below command:
awk 'FNR==NR{a[$0];next}!($0 in a)' hosts.csv masterlist.csv>results.csv
Though this gives the differences, if for a field one file contains NULL (as a word) and other empty/space for null values how to ignore this in the command and compare other fields?
Also would like to make a generic script or utility with such options, don't need the code but just a suggestion would be helpful.
Upvotes: 1
Views: 234
Reputation: 14955
You can try this fix in your awk
:
awk 'FNR==NR{if ($0 !~ /NULL| *|^$/){a[$0]}next}!($0 in a)' hosts.csv masterlist.csv>results.csv
As @fedorqui suggest in comments, here's another alternative:
awk 'FNR==NR{if ($0 !~ /NULL/ && NF){a[$0]}next}!($0 in a)' hosts.csv masterlist.csv>results.csv
Upvotes: 2
Reputation: 41
try to compare them using binary. if you compress the file into a binary (serialization), you can then compare them quite rapidly. if there is a difference you can then go through the file and compare them using similar methods to git... check their source code. hope this helps
Upvotes: 0