Reputation: 1
How can I detect (preferably Java) duplicate MP3 files with different ID3 tags? The files have the same encoding / format. It should work with both versions of ID3: ID3v1 and ID3v2.
This is my code so far. But it is not working with Id3v1 tags.
try {
String filepath = "c:\tmp";
Vector<String> mp3_files = new Vector<String>();
mp3_files.add(filepath + "test_with_id3.mp3");
mp3_files.add(filepath + "test_without_id3");
Iterator<String> i_mp3fp = mp3_files.iterator();
while (i_mp3fp.hasNext()){
String mp3_fp = i_mp3fp.next();
AudioInputStream din = null;
File file = new File(mp3_fp);
AudioInputStream in = AudioSystem.getAudioInputStream(file);
AudioFormat baseFormat = in.getFormat();
AudioFormat decodedFormat = new AudioFormat(
AudioFormat.Encoding.PCM_SIGNED,
baseFormat.getSampleRate(), 16, baseFormat.getChannels(),
baseFormat.getChannels() * 2, baseFormat.getSampleRate(),
false);
din = AudioSystem.getAudioInputStream(decodedFormat, in);
String md5 = org.apache.commons.codec.digest.DigestUtils.md5Hex( din );
System.out.println("Name: "+mp3_fp+" | Hash: "+md5);
din.close();
}
When I did this I thought I had to compare mp3 with different encodings. Anyways. I think a better solution would be just reading the mp3 files - ignoring all the id3 tags - do a checksum and compare them. Is there a lib for reading and filtering a mp3 file?
Thank you guys for your help!
Upvotes: 0
Views: 868
Reputation: 38898
While there is surely a way to do this in Java, I suspect it might be quicker to use FFmpeg + bash.
for file in *.mp3
do
ffmpeg -i $file -f s16le - | md5 > $file.md5
done
Upvotes: 1
Reputation: 340933
I don't have any experience with MP3 and ID3 tags format, but a quick look to Wikipedia reveals that:
The ID3v1 tag occupies 128 bytes, beginning with the string
TAG
. The tag was placed at the end of the file
Just read the whole MP3 file skipping the last 128 bytes.
3.1. ID3v2 header
The ID3v2 tag size is stored as a 32 bit synchsafe integer (section 6.2), making a total of 28 effective bits (representing up to 256MB).
The header format is pretty simple. If the file starts with ID3v2 header, read the total header size and skip that many bytes.
Once you have the "raw" file, compare contents byte-by-byte or using a hash.
Upvotes: 0