Sarfaraz Jamal
Sarfaraz Jamal

Reputation: 403

Checking 2 Million Files for Duplicates

We need to check 2 million files to see if they have any duplicates.

What would the best way be of doing that?

http://www.easyduplicatefinder.com/ We have used this tool to do roughly 20k files

But soon we will have to do 2 million

Any ideas on how this could be done in an efficient manner?

Sas

Upvotes: 2

Views: 204

Answers (1)

Tadeck
Tadeck

Reputation: 137290

Create checksums in MD5 or SHA-1 (preferable, as the collisions are less likely), or even both (when collisions are so very unlikely, that you can sleep well knowing you have made no mistake).

Then compare checksums. This will compare the contents. If you want to also compare the names of the files, take them into account when comparing.

That is all. Quite (very) reliable.

Upvotes: 5

Related Questions