Reputation: 347
As a personal project (in order to learn python better), I starter working on a duplicate file remover (especially for .mp3 files since I thought of it while trying to organise my full-of-duplicates music collection). Now, I'm fairly clear on how to proceed, matching file names and offering for deletion only those that present more that 0.7 similarity ratio, and using md5 sums for those files that are the same but have completely different names (eg: "metallica-nothing else matters" and "Track1"). The problem is that I don't know what to do about those files that have different names and they are a bit different from one another, for example, "nothing else matters" and "Track1" are the same except for the fact that "Track1" has 2 seconds of silence at the end. My question is: Is there some kind of method or algorithm that checks similarities between files themselves? Something like string matching but on files? Doesn't matter if it's a complicated algorithm, the harder the better since I'm doing this only to learn :D
Upvotes: 0
Views: 624
Reputation: 380
you can also look at win32 module, here is the link
http://timgolden.me.uk/python/index.html
Upvotes: 0
Reputation: 12951
You could use Chromaprint, that computes a fingerprint for a piece of music. It should be able to find similar music files.
If you want to push this further, you could use the api of musicbrainz to find the exact information about a piece of music.
These libraries are used in two greats music library tagging and sorting applications I use : picard and beets.
Upvotes: 4