Get waveform data from audio file using FFMPEG

Question

I am writing an application that needs to get the raw waveform data of an audio file so I can render it in an application (C#/.NET). I am using ffmpeg to offload this task but it looks like ffmpeg can only output the waveform data as a png or as a stream to gnuplot.

I have looked at other libraries to do this (NAudio/CSCore) however they require windows/microsoft media foundation and since this app is going to be deployed to azure as a web app I can not use them.

My strategy was to just read the waveform data from the png itself but this seems hacky and over the top. The ideal output would be a fix sampled series of peaks in an array where each value in the array is the peak value (ranging from 1-100 or something, like this for example).

VC.One · Accepted Answer

Sabona budi,

Wrote about the manual way to get waveform but then to show you an example, I found this code which does what you want (or at the least, you can learn something from it).

1) Use FFmpeg to get array of samples

Try the example code shown here : http://blog.wudilabs.org/entry/c3d357ed/?lang=en-US

Experiment with it, try tweaking with suggestions from manual etc... In that shown code just change string path to point to your own file-path. Edit the proc.StartInfo.Arguments section to replace the last section to look like:

proc.StartInfo.Arguments = "-i \"" + path + "\" -vn -ac 1 -filter:a aresample=myNum -map 0:a -c:a pcm_s16le -f data -";

That myNum from the part aresample=myNum is calculated by :

44100 * total Seconds = X.
myNum = X / WaveForm Width.

Finally use the ProcessBuffer function with this logic :

static void ProcessBuffer(byte[] buffer, int length)
{
    float val; //amplitude value of a sample
    int index = 0; //position within sample bytes
    int slicePos = 0; //horizontal (X-axis) position for pixels of next slice


    while (index < length)
    {
        val = BitConverter.ToInt16(buffer, index);
        index += sizeof(short);

        // use number in va to do something...
        // eg: Draw a line on canvas for part of waveform's pixels
        // eg: myBitmap.SetPixel(slicePos, val, Color.Green);

        slicePos++;
    }
}

If you want to do it manually without FFmpeg. You could try...

2) Decode audio to PCM
You could just load the audio file (mp3) into your app and first decode that to PCM (ie: raw digital audio). Then read just the PCM numbers to make the waveform. Don't read numbers directly from bytes of compression math like MP3.

These PCM data values (about audio amplitudes) go into a byte array. If your sound is 16-bit then you extract the PCM value by reading each sample as a short (ie: getting value of two consecutive bytes at once since 16 bits == 2 bytes length).

Basically when you have 16-bit audio PCM inside a byte array, every two bytes represents an audio sample's amplitude value. This value becomes your height (loudness) at each slice. A slice is a 1-pixel vertical line from a time in the waveform.

Now sample rate means how many samples per-second. Usually 44100 samples (44.1 khz). You can see that using 44 thousand pixels to represent one second of sound is not feasible, so divide total seconds by required waveform width. Take the result & multiply by 2 (to cover two bytes) and that is how you much you jump-&-sample the amplitudes as you form the waveform. Do this in a while loop.

Get waveform data from audio file using FFMPEG

Answers (2)

Related Questions