Jeff
Jeff

Reputation: 31

How can I extract the audio data from an mp3 file?

I need to create a metadata independent hash of an mp3 file (i.e. the same hash can be computed after a retag). How can I extract the audio data only out into memory, without actually running it through a decompressor?

MAD seems like a good starting point - http://www.underbit.com/products/mad/ but does not seem to obviously expose a function for doing this.

Any pointers appreciated!

Upvotes: 3

Views: 8588

Answers (6)

technosaurus
technosaurus

Reputation: 7802

I wrote this bare bones little snippet for a Linux box with an old mp3 player that couldn't handle tags. What is left is just the mp3 headers and data (on stdout as coded). You can use that for your md5.

#include <fcntl.h>
#define DUMPTAGS
int main(int argc, char **argv){
   unsigned char buf[4096];
   int len,fd = open(argv[1],O_RDONLY);
   while (len=read(fd,buf,10)){ // handle ID3v2 tags (maybe multiple)
      if (buf[0]=='I' && buf[1]=='D' && buf[2]=='3'){
         len=read(fd,buf,buf[9]|(buf[8] << 7)|(buf[7] << 14)|(buf[6] << 21));
#ifdef DUMPTAGS
         write(2,buf,len);
#endif
      } else break;
   }
   while (write(1,buf,len)){
      unsigned char tag[3] = {'T','A','G'}, *end;
      len=read(fd,buf,4096);
      end=(unsigned char *)memmem(buf,len,&tag,3);
      if (end){ //handle ID3v1 tag (should only be 1)
         write(1,buf,end-buf);
#ifdef DUMPTAGS
         write(2,end,len-(end-buf));
#endif
         break;
      }
   }
}

Upvotes: 0

jackpots
jackpots

Reputation: 21

ffmpeg alone can calculate MD5 hash of audio segment of an audio file, i.e. sans metadata.

Use:

ffmpeg -v -i $file -acodec copy -f md5 -

Note that FLAC already has MD5 hash stored as metadata.

Upvotes: 2

Jason Pepas
Jason Pepas

Reputation: 444

I recently needed to solve this problem as well (detect duplicate mp3 files which had differing ID3 tags). The easiest thing to do was use ffmpeg to make a copy of the mp3 file with all of the ID3 tags stripped, and then take an md5 sum of that.

See https://github.com/pepaslabs/mp3md5sum

Upvotes: 1

x-x
x-x

Reputation: 7515

How can I extract the audio data only out into memory, without actually running it through a decompressor?

You can't extract the audio data without decompressing it - it's compressed! However, if you just want the raw compressed stream, read on!

The typical mp3 audio file will be divided into sections:
[likely metatag]
[possible junk]
[possible XING/LAME tags [possible more junk]]
[mp3 audio frames]
[possible metatag]

Likely metatag: Most mp3 audio files will have an id3 tag at their head. Be aware that some users may tag their mp3 files with different tagging formats, such as APE, so you will need to account for that too.

Possible junk: Some mp3 audio files have been tagged, re-tagged and converted so many times the metatag header may not provide you an accurate offset the the first audio frame, as remnants of previous tags can be left behind. foobar2000 has an option to fix this.

Possible XING/LAME tags: These are contained within a mp3 audio frame, though they do not contain actual audio. madplay has code to show you how to read and parse these frames. The XING/LAME header may have a frame count, so it's worth parsing these headers. Again, if the file has been through many different taggers and editors, there may be several malformed, no-valid audio frames found here.

MP3 audio frames: The actual compressed stream, broken into 'frames'. Each frame will begin with a sync bit pattern, 0xFFE.

Possible metatag: It's not uncommon to find more metatags at the end of the file. id3v1, APE, Lyrics all can be found here.

To find the audio frames offset, you will need to parse any metatag headers, then begin looking for the sync bit pattern. You can't just begin looking for the sync pattern from the start of the file, as not all taggers correctly support unsynchronization, so the metatag itself may contain the 0xFFE pattern.

Once you have the the offset to the first audio frame, you should look at the end of the file and calculate how much non-audio data is there so you know when to stop parsing the audio. Once you have the offset to the start of the audio data, and the offset to the end of the audio data, you can pass the audio data through your hash/checksum function!

Upvotes: 7

count0
count0

Reputation: 2621

You can use ffmpeg to directly access the audio content by using the copy mode. It will not matter what format, since the API will give you a container with the raw data (in copy mode only again). You can also demux and decode in case you have a video or you want to work on the decoded audio data.

Check out ffmpeg's examples for a quick intro on how to do this. By using ffmpeg i mean not using the tool but using libffmpeg (libavformat, libavcodec) from within c++/c, eventhough i think you could also do this from the cmdline using the ffmpeg tool by sending your output to stdout and pipe it to md5sum or something equivalent (if you're a unix user, that is).

The special case "-acodec copy" tells ffmpeg to use the same codec to encode as was used to decode. In other words, no transcoding of the audio occurs.

Upvotes: 3

Marc B
Marc B

Reputation: 360672

What kind of audio data? The raw decoded PCM stream? The individual MP3 frames? What if it's an MP3 encapsulated in a .wav? It could still have a .mp3 extension, but have the full .wav wrapper around it.

Stripping off an ID3v1 tag is simple - it's just 128 bytes at the end of the file. ID3v2 is a bit harder - it's variable length and prepended to the start of the MP3 and you'd have to parse out the length field (which is 4 bytes where only the lowest 7bits are used, giving a 28bit max-length for the tag). The .wav wrapper would be harder still - I don't know any details about what .wav imposes as metadata.

Upvotes: 2

Related Questions