Clay Raynor
Clay Raynor

Reputation: 316

What integer type is used for MP3 data frames?

I am writing a universal parser library for various binary formats in Rust as part of a personal project. I've started researching the file structure of MP3 files. As I understand it, an MP3 file structure consists of header and data frames. Each header frame provides meta information about the proceeding data frame. Here is a diagram and a listing of allowed values for MP3 header frames that I am referencing.

I understand the format of the MP3 header. My confusion, or lack of information, surrounds MP3 data frames. I can't seem to find a source that specifies what integer type samples are encoded as in the data frame portion of an MP3 file. Are they 8 bit, 16 bit, 32 bit, signed, unsigned, etc?

The best I can think of is, to use a combination of the sample rate frequency and bitrate to calculate what each sample size should. However, that doesn't determine if each sample is a signed or unsigned integer.

I'm not trying decode these files, I'm just trying to parse them. I've had a surprisingly hard time finding this information. Any information or helpful someone can offer would be much appreciated.

Upvotes: 1

Views: 612

Answers (1)

maxwellmattryan
maxwellmattryan

Reputation: 123

Although this is not related to .mp3 per se, there could potentially be some helpful information in Will C. Pirkle's book, Designing Audio Effect Plugins in C++.

He discusses the way in which the .wav audio format stores its information. It uses signed integers starting from -32,768 to 32,767. This represents a range of 2^16 in a bipolar format, where the exponent corresponds to the bit-depth (most commonly 16 or 24).

Another important thing to note is that while phase inversion is a common thing in many audio applications, there is no corresponding integer for inverting -32,768. To compensate, it's common to treat the value -32,768 as -32,767. This only matters though if you are using the value 0 in your processing, which is most often the case. Otherwise, one could extend the upper limit to 32,768.

He does state that it's more common for audio processing applications to deal with floating point numbers either between 0.0f and 1.0f or -1.0f and 1.0f. The reason is that due to addition and multiplication being common operations in DSP, we avoid overflowing that range if we use these floating points. In the bipolar integer format, it's too easy to find two numbers that result in a product or sum outside that range. In the range of -1.0f to 1.0f, any two numbers will always result in a product that's still within that range. Unfortunately, addition still requires caution, but eh...

I'm sorry I don't have more information about .mp3s specifically, but perhaps this could still be insightful.

Good luck!

Upvotes: 3

Related Questions