Manuel
Manuel

Reputation: 1

Detect duplicate MP3 files with ID3 tags?

How can I detect (preferably Java) duplicate MP3 files with different ID3 tags? The files have the same encoding / format. It should work with both versions of ID3: ID3v1 and ID3v2.

This is my code so far. But it is not working with Id3v1 tags.

try {

       String filepath = "c:\tmp";

       Vector<String> mp3_files = new Vector<String>();
       mp3_files.add(filepath + "test_with_id3.mp3");
       mp3_files.add(filepath + "test_without_id3");

       Iterator<String> i_mp3fp = mp3_files.iterator();

       while (i_mp3fp.hasNext()){

          String mp3_fp = i_mp3fp.next();

          AudioInputStream din = null;
          File file = new File(mp3_fp);
          AudioInputStream in = AudioSystem.getAudioInputStream(file);
          AudioFormat baseFormat = in.getFormat();

          AudioFormat decodedFormat = new AudioFormat(
             AudioFormat.Encoding.PCM_SIGNED,
             baseFormat.getSampleRate(), 16, baseFormat.getChannels(),
             baseFormat.getChannels() * 2, baseFormat.getSampleRate(),
             false);
          din = AudioSystem.getAudioInputStream(decodedFormat, in);

          String md5 = org.apache.commons.codec.digest.DigestUtils.md5Hex( din );
          System.out.println("Name: "+mp3_fp+" | Hash: "+md5);
          din.close();

}

When I did this I thought I had to compare mp3 with different encodings. Anyways. I think a better solution would be just reading the mp3 files - ignoring all the id3 tags - do a checksum and compare them. Is there a lib for reading and filtering a mp3 file?

Thank you guys for your help!

Upvotes: 0

Views: 868

Answers (2)

Stu Thompson
Stu Thompson

Reputation: 38898

Convert the files to raw PCM, and MD5 the output

While there is surely a way to do this in Java, I suspect it might be quicker to use FFmpeg + bash.

for file in *.mp3
do
ffmpeg -i $file -f s16le  - | md5 > $file.md5
done

Upvotes: 1

Tomasz Nurkiewicz
Tomasz Nurkiewicz

Reputation: 340933

I don't have any experience with MP3 and ID3 tags format, but a quick look to Wikipedia reveals that:

ID3v1

The ID3v1 tag occupies 128 bytes, beginning with the string TAG. The tag was placed at the end of the file

Just read the whole MP3 file skipping the last 128 bytes.

ID3v2

3.1. ID3v2 header

The ID3v2 tag size is stored as a 32 bit synchsafe integer (section 6.2), making a total of 28 effective bits (representing up to 256MB).

The header format is pretty simple. If the file starts with ID3v2 header, read the total header size and skip that many bytes.

Once you have the "raw" file, compare contents byte-by-byte or using a hash.

Upvotes: 0

Related Questions