Reputation: 5010
I am writing an application that needs to get the raw waveform data of an audio file so I can render it in an application (C#/.NET). I am using ffmpeg to offload this task but it looks like ffmpeg can only output the waveform data as a png or as a stream to gnuplot.
I have looked at other libraries to do this (NAudio/CSCore) however they require windows/microsoft media foundation and since this app is going to be deployed to azure as a web app I can not use them.
My strategy was to just read the waveform data from the png itself but this seems hacky and over the top. The ideal output would be a fix sampled series of peaks in an array where each value in the array is the peak value (ranging from 1-100 or something, like this for example).
Upvotes: 5
Views: 8507
Reputation: 6762
You can use the function described in this tutorial to get the raw data decoded from an audio file as an array of double
values.
Summarizing from the link:
The function decode_audio_file
takes 4 parameters:
It returns 0 upon success, and -1 in case of failure, assorted with error message written to the stderr
stream.
The function code is below:
#include <stdio.h>
#include <stdlib.h>
#include <libavutil/opt.h>
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libswresample/swresample.h>
int decode_audio_file(const char* path, const int sample_rate, double** data, int* size) {
// initialize all muxers, demuxers and protocols for libavformat
// (does nothing if called twice during the course of one program execution)
av_register_all();
// get format from audio file
AVFormatContext* format = avformat_alloc_context();
if (avformat_open_input(&format, path, NULL, NULL) != 0) {
fprintf(stderr, "Could not open file '%s'\n", path);
return -1;
}
if (avformat_find_stream_info(format, NULL) < 0) {
fprintf(stderr, "Could not retrieve stream info from file '%s'\n", path);
return -1;
}
// Find the index of the first audio stream
int stream_index =- 1;
for (int i=0; i<format->nb_streams; i++) {
if (format->streams[i]->codec->codec_type == AVMEDIA_TYPE_AUDIO) {
stream_index = i;
break;
}
}
if (stream_index == -1) {
fprintf(stderr, "Could not retrieve audio stream from file '%s'\n", path);
return -1;
}
AVStream* stream = format->streams[stream_index];
// find & open codec
AVCodecContext* codec = stream->codec;
if (avcodec_open2(codec, avcodec_find_decoder(codec->codec_id), NULL) < 0) {
fprintf(stderr, "Failed to open decoder for stream #%u in file '%s'\n", stream_index, path);
return -1;
}
// prepare resampler
struct SwrContext* swr = swr_alloc();
av_opt_set_int(swr, "in_channel_count", codec->channels, 0);
av_opt_set_int(swr, "out_channel_count", 1, 0);
av_opt_set_int(swr, "in_channel_layout", codec->channel_layout, 0);
av_opt_set_int(swr, "out_channel_layout", AV_CH_LAYOUT_MONO, 0);
av_opt_set_int(swr, "in_sample_rate", codec->sample_rate, 0);
av_opt_set_int(swr, "out_sample_rate", sample_rate, 0);
av_opt_set_sample_fmt(swr, "in_sample_fmt", codec->sample_fmt, 0);
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_DBL, 0);
swr_init(swr);
if (!swr_is_initialized(swr)) {
fprintf(stderr, "Resampler has not been properly initialized\n");
return -1;
}
// prepare to read data
AVPacket packet;
av_init_packet(&packet);
AVFrame* frame = av_frame_alloc();
if (!frame) {
fprintf(stderr, "Error allocating the frame\n");
return -1;
}
// iterate through frames
*data = NULL;
*size = 0;
while (av_read_frame(format, &packet) >= 0) {
// decode one frame
int gotFrame;
if (avcodec_decode_audio4(codec, frame, &gotFrame, &packet) < 0) {
break;
}
if (!gotFrame) {
continue;
}
// resample frames
double* buffer;
av_samples_alloc((uint8_t**) &buffer, NULL, 1, frame->nb_samples, AV_SAMPLE_FMT_DBL, 0);
int frame_count = swr_convert(swr, (uint8_t**) &buffer, frame->nb_samples, (const uint8_t**) frame->data, frame->nb_samples);
// append resampled frames to data
*data = (double*) realloc(*data, (*size + frame->nb_samples) * sizeof(double));
memcpy(*data + *size, buffer, frame_count * sizeof(double));
*size += frame_count;
}
// clean up
av_frame_free(&frame);
swr_free(&swr);
avcodec_close(codec);
avformat_free_context(format);
// success
return 0;
}
You will need the following flags to compile a program that uses :
-lavcodec-ffmpeg -lavformat-ffmpeg -lavutil -lswresample
Depending on your system and installation, it could also be:-lavcodec -lavformat -lavutil -lswresample
and its usage is below:
int main(int argc, char const *argv[]) {
// check parameters
if (argc < 2) {
fprintf(stderr, "Please provide the path to an audio file as first command-line argument.\n");
return -1;
}
// decode data
int sample_rate = 44100;
double* data;
int size;
if (decode_audio_file(argv[1], sample_rate, &data, &size) != 0) {
return -1;
}
// sum data
double sum = 0.0;
for (int i=0; i<size; ++i) {
sum += data[i];
}
// display result and exit cleanly
printf("sum is %f", sum);
free(data);
return 0;
}
Upvotes: 1
Reputation: 15871
Sabona budi,
Wrote about the manual way to get waveform but then to show you an example, I found this code
which does what you want (or at the least, you can learn something from it).
1) Use FFmpeg to get array of samples
Try the example code shown here : http://blog.wudilabs.org/entry/c3d357ed/?lang=en-US
Experiment with it, try tweaking with suggestions from manual etc... In that shown code just change string path
to point to your own file-path. Edit the proc.StartInfo.Arguments
section to replace the last section to look like:
proc.StartInfo.Arguments = "-i \"" + path + "\" -vn -ac 1 -filter:a aresample=myNum -map 0:a -c:a pcm_s16le -f data -";
That myNum
from the part aresample=myNum
is calculated by :
44100 * total Seconds = X.
myNum = X / WaveForm Width.
Finally use the ProcessBuffer
function with this logic :
static void ProcessBuffer(byte[] buffer, int length)
{
float val; //amplitude value of a sample
int index = 0; //position within sample bytes
int slicePos = 0; //horizontal (X-axis) position for pixels of next slice
while (index < length)
{
val = BitConverter.ToInt16(buffer, index);
index += sizeof(short);
// use number in va to do something...
// eg: Draw a line on canvas for part of waveform's pixels
// eg: myBitmap.SetPixel(slicePos, val, Color.Green);
slicePos++;
}
}
If you want to do it manually without FFmpeg. You could try...
2) Decode audio to PCM
You could just load the audio file (mp3) into your app and first decode that to PCM (ie: raw digital audio). Then read just the PCM numbers to make the waveform. Don't read numbers directly from bytes of compression math like MP3.
These PCM data values (about audio amplitudes) go into a byte array. If your sound is 16-bit then you extract the PCM value by reading each sample as a short
(ie: getting value of two consecutive bytes at once since 16 bits == 2 bytes length
).
Basically when you have 16-bit audio PCM inside a byte array, every two bytes represents an audio sample's amplitude value. This value becomes your height (loudness) at each slice. A slice is a 1-pixel vertical line from a time in the waveform.
Now sample rate means how many samples per-second. Usually 44100 samples (44.1 khz). You can see that using 44 thousand pixels to represent one second of sound is not feasible, so divide
total seconds by required waveform width
. Take the result & multiply
by 2 (to cover two bytes) and that is how you much you jump-&-sample the amplitudes as you form the waveform. Do this in a while
loop.
Upvotes: 5