Reputation: 1
I want to raises or lowers the pitch of an audio file (.wav format) without using external libraries. Could everyone suggest to me some solution? Many thanks.
Upvotes: 0
Views: 600
Reputation: 7910
The basic steps are as follows:
Convert the audio data to PCM (I prefer using signed normalized floats).
Create a cursor variable that will advance through the PCM array. At each cursor location you will take a PCM value and place it in an array for playback.
The increment by which the cursor advances will determine the new playback rate. Examples: if you simply iterate by one through the PCM array, the new PCM array will have the same data and will play back at the same rate. If you increment by 2, the new PCM array will have half the data points and play at twice the rate. If you advance by 1.5, the new PCM array will be 1.5 X's faster than the original.
Use linear interpolation when your cursor lands in between two data points. Example: if your cursor increments by 1.75, and the first three data points on the PCM are 0, 0.2, 0.4, your second "new PCM array" value will be derived from calculating a point in between the [1] and [2] PCM array values of 0.2 and 0.4. (I am counting the array as starting at [0].) In this example, the new PCM value would be the following:
newPCM = (PCM[round(cursor)] * (1 - decimalPartOfCursor)) + PCM[round(cursor) + 1] * (decimalPartOfCursor)
or
newPCM = (0.2 * 0.25) + (0.4 * 0.75)
Convert the new PCM array back to a byte stream for playback.
Upvotes: 1
Reputation: 51835
in addition to Phil Freihofner's answer in case you want to leave the time as is then you need to convert your data to:
convert your PCM data from time domain to frequency domain
so take a chunk of you PCM data (use power of 2 samples for the window in order to use FFT for example 1024 samples) and apply FFT on it.
shift the frequencies
each entry in FFT output represents a Niquist frequency amplitude and phase (as complex domain phasor) so simply throw away the low frequencies and replace them with the values from higher frequencies and zero pad the rest. Beware the FFT result of real domain input is symmetric meaning "all" the points are there twice.
here an example of 8 point FFT result:
real PCM: { 0, 1, 2, 3, 4, 5, 6, 7 }
cplx DFFT { 7, 0, -1, -2.414214, -1, -1, -1, -0.4142136, -1, 0, -1, 0.4142136, -1, 1, -1, 2.414214 }
the first entry (7,0
) of DFFT result is the DC offset and then there are 2 mirrors for each niquist frequency:
7, 0, // DC offset
-1, -2.414214, // 0
-1, -1, // 1
-1, -0.4142136,// 2
-1, 0, // 3
-1, 0.4142136, // 2
-1, 1, // 1
-1, 2.414214 // 0
So in order to lower the pitch shift the mirrors towards lower frequencies and zero pad:
7, 0, // DC offset
-1, -1, // 1
-1, -0.4142136,// 2
-1, 0, // 3
0, 0, // zero pad
-1, 0, // 3
-1, 0.4142136, // 2
-1, 1, // 1
For increasing pitch shift in opposite direction and also zero pad and leave the highest frequency as is (should be empty):
7, 0, // DC offset
0, 0, // zero pad
-1, -2.414214, // 0
-1, -1, // 1
-1, 0, // 3 this will be probably (0,0) as while increasin grequency the original signal would lack high frequencies
-1, 1, // 1
-1, 2.414214 // 0
0, 0, // zero pad
reconstruct PCM by converting back to time domain
So simply apply inverse FFT on the shifted data. This will create your wanted PCM.
repeat this for whole PCM input
This is not my cup of tea but its possible there might be needed some smoothing along boundaries of individual PCM chunks due to frequency shifts. To avoid this you can either do FFT/iFFT on whole PCM data (this might be a problem if data too big) or overlap the chunks a bit and blend them with weight map together however this might produce glitches in sound.
Upvotes: 1
Reputation: 94
You can play it faster to get a higher pitch. But if you want to keep the same playback speed and change the pitch, this is MUCH more complicated and you need an "Alvin and the Chipmunks" like sound processing algorithm.
https://en.wikipedia.org/wiki/Audio_time_stretching_and_pitch_scaling
Upvotes: 0