Reputation: 403
We need to check 2 million files to see if they have any duplicates.
What would the best way be of doing that?
http://www.easyduplicatefinder.com/ We have used this tool to do roughly 20k files
But soon we will have to do 2 million
Any ideas on how this could be done in an efficient manner?
Sas
Upvotes: 2
Views: 204
Reputation: 137290
Create checksums in MD5 or SHA-1 (preferable, as the collisions are less likely), or even both (when collisions are so very unlikely, that you can sleep well knowing you have made no mistake).
Then compare checksums. This will compare the contents. If you want to also compare the names of the files, take them into account when comparing.
That is all. Quite (very) reliable.
Upvotes: 5