cpp_ninja
cpp_ninja

Reputation: 347

Remove duplicate files using python

As a personal project (in order to learn python better), I starter working on a duplicate file remover (especially for .mp3 files since I thought of it while trying to organise my full-of-duplicates music collection). Now, I'm fairly clear on how to proceed, matching file names and offering for deletion only those that present more that 0.7 similarity ratio, and using md5 sums for those files that are the same but have completely different names (eg: "metallica-nothing else matters" and "Track1"). The problem is that I don't know what to do about those files that have different names and they are a bit different from one another, for example, "nothing else matters" and "Track1" are the same except for the fact that "Track1" has 2 seconds of silence at the end. My question is: Is there some kind of method or algorithm that checks similarities between files themselves? Something like string matching but on files? Doesn't matter if it's a complicated algorithm, the harder the better since I'm doing this only to learn :D

Upvotes: 0

Views: 624

Answers (2)

Infinite_Loop
Infinite_Loop

Reputation: 380

you can also look at win32 module, here is the link

http://timgolden.me.uk/python/index.html

Upvotes: 0

madjar
madjar

Reputation: 12951

You could use Chromaprint, that computes a fingerprint for a piece of music. It should be able to find similar music files.

If you want to push this further, you could use the api of musicbrainz to find the exact information about a piece of music.

These libraries are used in two greats music library tagging and sorting applications I use : picard and beets.

Upvotes: 4

Related Questions