Reputation: 61
How can I check for duplicate files using md5sum in perl in an if statement?
I am looking for a line of code that does this:
if { (md5 of new file matches any of the md5sum values of already parsed files)
print "duplicate found"
} else { new file and add md5sum to a list for check)
print "new file"
}
Upvotes: 0
Views: 188
Reputation: 118118
The basic idea is to calculate a hash-code for each file you encounter. In pseudo-code:
my %md5_to_file;
for every file
push @{ $md5_to_file{ md5 of file } }, file
Then, any value in the %md5_to_file
mapping with cardinality > 1 points to possible duplicates. You can then do further checks to ascertain whether you have collisions or genuine duplicates.
See also DFW Perl Mongers ONLINE Hackathon Smackdown - Results, Awards, And Code .
Upvotes: 1
Reputation: 6652
Generally the idiomatic way of performing this operation is to use a hash.
use strict;
use warnings;
use 5.018;
my %seen;
for my $string (qw/ one two three four one five six four seven two one /) {
if ( $seen{$string} ) {
say "saw $string";
}
else {
$seen{$string}++;
say "new $string";
}
}
How is the hash used to find unique items goes into more detail.
As mentioned in comment, you'd use a library like Digest::MD5 to generate the MD5 strings for the files. Hooking the two together is left an an exercise for the reader.
Upvotes: 0