Reputation: 396

Comparing two huge files in Unix

Requirement is to compare two huge Unix files and writing the difference in third file based on a unique key (first field) after searching few options got the below command:

awk 'FNR==NR{a[$0];next}!($0 in a)' hosts.csv masterlist.csv>results.csv

Though this gives the differences, if for a field one file contains NULL (as a word) and other empty/space for null values how to ignore this in the command and compare other fields?

Also would like to make a generic script or utility with such options, don't need the code but just a suggestion would be helpful.

Upvotes: 1

Answers (2)

Juan Diego Godoy Robles

Reputation: 14955

You can try this fix in your awk:

awk 'FNR==NR{if ($0 !~ /NULL|  *|^$/){a[$0]}next}!($0 in a)' hosts.csv masterlist.csv>results.csv

As @fedorqui suggest in comments, here's another alternative:

awk 'FNR==NR{if ($0 !~ /NULL/ && NF){a[$0]}next}!($0 in a)' hosts.csv masterlist.csv>results.csv

Upvotes: 2

jjscloud

Reputation: 41

try to compare them using binary. if you compress the file into a binary (serialization), you can then compare them quite rapidly. if there is a difference you can then go through the file and compare them using similar methods to git... check their source code. hope this helps

Upvotes: 0

Comparing two huge files in Unix

Answers (2)

Related Questions