Niccolò Cirone
Niccolò Cirone

Reputation: 61

Convert encoded audio file to text with signal values

I've been programming in c for the first time with audio files. I found this code which supposedly should read an audio file and then write a csv file containing several info in order to analyse the audio waves,that in case will be a simple voice: i'm interested in amplitude of the waves, in the timbre of the voice and its hight and extension.

           main () {   
           // Create a 20 ms audio buffer (assuming Fs = 44.1 kHz)
           int16_t buf[N] = {0}; // buffer
           int n;                // buffer index

          // Open WAV file with FFmpeg and read raw samples via the pipe.
          FILE *pipein;
          pipein = popen("ffmpeg -i whistle.wav -f s16le -ac 1 -", "r");
          fread(buf, 2, N, pipein);
          pclose(pipein);

          // Print the sample values in the buffer to a CSV file
          FILE *csvfile;
          csvfile = fopen("samples.csv", "w");
          for (n=0 ; n<N ; ++n) fprintf(csvfile, "%d\n", buf[n]);
          fclose(csvfile);

       }

Could someone explain me in detail how can I read an audio file so that I could extract from it the info I need? Referring to this code, could someone explain me the meaning of the pipe at line 8

pipein = popen("ffmpeg -i whistle.wav -f s16le -ac 1 -", "r");

p.s. I already know how to read the header of the audio file, which contains a lot of useful info, but I also want to analyse the entire audio file, sample by sample.

Upvotes: 3

Views: 1234

Answers (1)

Scott Stensland
Scott Stensland

Reputation: 28305

I just compiled then ran your code ... the output file samples.csv is a vertical column of signed 16 bit integers which represents each of the samples of your input audio curve ... as in : YMMV

-20724
-19681
-18556
-17359
-16096
-14766
-13383
-11940
-10460
-8928
-7371
-5778
-4165
-2536
-897
749
2385
4019
5633
7224
8793
10318
11811
13251
14644
15977
17247

... so while that raw audio is in your variable buf you can add to your above code to answer your questions

volume - audio is a curve so when the curve fails to wobble its silent ... its critical to understand meaning of bit depth when calculating volume ... I suggest you open up output file in a text editor to eyeball every value ... knowing you have a bit depth of 16 bits tells you the number of possible integer values ... on a blank stare read up on PCM raw audio ... to a first approximation the following changes to your code will tell you the volume

int min_value = 9999;
int max_value = -9999;

for (n=0 ; n < N ; ++n) {

    if (buf[n] < min_value)  min_value = buf[n];
    if (buf[n] > max_value)  max_value = buf[n];

    fprintf(csvfile, "%d\n", buf[n]);
}

fclose(csvfile);

printf("min_value %d\n", min_value);
printf("max_value %d\n", max_value);

knowing the bit depth of your audio, lets say its 16 bits, then you have 2^16 possible distinct integers ... from say 0 to (65536 - 1) to represent the curve of your raw audio ... that is if your data is unsigned ... if its signed integers (as defined in the WAV file headers) then shift that range so its zero centered ... then the range would go from -32768 to (+32768 - 1) or -32768 to +32767 ... so if your audio buf[n] values traverse the full possible range from min to max values then your audio stretch of samples can be said to be at full volume ... now we are in a position to interpret the above measurements : min_value and max_value ... if min_value is around -16384 and if max_value is around +16384 then the volume would be about half the maximum since its only consuming half of the range of possible integer values

so volume in a range from 0 to 1 ( min to max volume) can be calculated (by oversimplifying) using this formula

num_possible_ints = 2^bit_depth  // == 65536 for bit depth of 16 bits 
volume = 1 - ( num_possible_ints - ( max_value - min_value )) / num_possible_ints

why is this oversimplifying ? because without pre-processing your audio buffer [by discarding outlying audio samples which only rarely spike to max or min, if desired] this approach is prone to giving too high a volume measurement

there are better measurements of volume yet keep in mind its prone to perceptual biases ... lookup Root Mean Square to calculate volume with better accuracy ... to quote :

RMS is averaging the area displaced by the signal, the area between the waveform and the linear zero line (not 0dB, but the axis).

As the waveform swings both above (+) and below (-) the centreline, the polarity of the swings has to be disregarded. Luckily, in maths, anything multiplied by itself (squaring) ends up positive. The signal can then be averaged (arithmetic mean over the timeline/window ED mentions or its integration time) as the positive and negative halves won't now cancel each other out -and finally the inverse to squaring is executed -square root.

RMS just means root-mean-square or the square-root of the arithmetic mean of the square of the signal.

In reality, what it means is that a signal of high-amplitude, spikey, transient content can have the same RMS value as one of lower amplitude but fatter waveform -because they both have the same energy content. If you put them through a speaker, they should both generate the same acoustical energy output.

Typical spikey waveforms are things like drum transients, whereas fatter waveforms would be sine waves or even square waves (as fat as you can get), where a much lower peak level would be needed to have the same power (a sine wave of 1.4Vp has the same RMS level as a square wave of 1.0Vp).

... this should get you started

PS popen is doing a stream read from the input file

Upvotes: 5

Related Questions