zerodb
zerodb

Reputation: 61

check for md5sum to identify duplicate files in perl

How can I check for duplicate files using md5sum in perl in an if statement?

I am looking for a line of code that does this:

if { (md5 of new file matches any of the md5sum values of already parsed files)
print "duplicate found"
} else { new file and add md5sum to a list for check)
print "new file"
}

Upvotes: 0

Views: 188

Answers (2)

Sinan Ünür
Sinan Ünür

Reputation: 118118

The basic idea is to calculate a hash-code for each file you encounter. In pseudo-code:

my %md5_to_file;

for every file
    push @{ $md5_to_file{ md5 of file } }, file

Then, any value in the %md5_to_file mapping with cardinality > 1 points to possible duplicates. You can then do further checks to ascertain whether you have collisions or genuine duplicates.

See also DFW Perl Mongers ONLINE Hackathon Smackdown - Results, Awards, And Code .

Upvotes: 1

Oesor
Oesor

Reputation: 6652

Generally the idiomatic way of performing this operation is to use a hash.

use strict;
use warnings;
use 5.018;

my %seen;

for my $string (qw/ one two three four one five six four seven two one /) {
    if ( $seen{$string} ) {
        say "saw $string";
    }
    else {
        $seen{$string}++;
        say "new $string";
    }
}

How is the hash used to find unique items goes into more detail.

As mentioned in comment, you'd use a library like Digest::MD5 to generate the MD5 strings for the files. Hooking the two together is left an an exercise for the reader.

Upvotes: 0

Related Questions