Reputation: 948
I am using FFmpeg library to decode and (potentially) modify some audio.
I managed to use the following functions to iterate through all frames of the audio file:
avformat_open_input // Obtains formatContext
avformat_find_stream_info
av_find_best_stream // The argument AVMEDIA_TYPE_AUDIO is fed in to find the audio stream
avcodec_open2 // Obtains codecContext
av_init_packet
// The following is used to loop through the frames
av_read_frame
avcodec_decode_audio4
In the end, I have these three values available on each iteration
int dataSize; // return value of avcodec_decode_audio4
AVFrame* frame;
AVCodecContext* codecContext; // Codec context of the best stream
I supposed that a loop like this can be used to iterate over all samples:
for (int i = 0; i < frame->nb_samples; ++i)
{
// Bytes/Sample is known to be 4
// Extracts audio from Channel 1. There are in total 2 channels.
int* sample = (int*)frame->data[0] + dataSize * i;
// Now *sample is accessible
}
However, when I plotted the data using gnuplot
, I did not get a waveform as expected, and some of the values reached the the limit of 32 bits integers: (The audio stream is not silent in the first few seconds)
I suppose that some form of quantisation is going on to prevent the data from being interpreted mathematically. What should I do to de-quantise this?
Upvotes: 2
Views: 717
Reputation: 11174
for (int i = 0; i < frame->nb_samples; ++i) { // Bytes/Sample is known to be 4 // Extracts audio from Channel 1. There are in total 2 channels. int* sample = (int*)frame->data[0] + dataSize * i; // Now *sample is accessible }
Well... No. So, first of all, we'll need to know the data type. Check frame->format
. It's an enum AVSampleFormat, most likely flt, fltp, s16 or s16p.
So, how do you interpret frame->data[]
given the format? Well, first, is it planar or not? If it's planar, it means each channel is in frame->data[n], where n is the channel number. frame->channels
is the number of channels. If it's not planar, it means all data is interleaved (per sample) in frame->data[0]
.
Second, what is the storage type? If it's s16/s16p, it's int16_t *
. If it's flt/fltp, it's float *
. So the correct interpretation for fltp would be:
for (int c = 0; c < frame->channels; c++) {
float *samples = frame->data[c];
for (int i = 0; i < frame->nb_samples; i++) {
float sample = samples[i];
// now this sample is accessible, it's in the range [-1.0, 1.0]
}
}
Whereas for s16, it would be:
int16_t *samples = frame->data[0];
for (int c = 0; c < frame->channels; c++) {
for (int i = 0; i < frame->nb_samples; i++) {
int sample = samples[i * frame->channels + c];
// now this sample is accessible, it's in the range [-32768,32767]
}
}
Upvotes: 3