Reputation: 29280
I'm trying to normalize an audio file of speech.
Specifically, where an audio file contains peaks in volume, I'm trying to level it out, so the quiet sections are louder, and the peaks are quieter.
I know very little about audio manipulation, beyond what I've learnt from working on this task. Also, my math is embarrassingly weak.
I've done some research, and the Xuggle site provides a sample which shows reducing the volume using the following code: (full version here)
@Override
public void onAudioSamples(IAudioSamplesEvent event)
{
// get the raw audio byes and adjust it's value
ShortBuffer buffer = event.getAudioSamples().getByteBuffer().asShortBuffer();
for (int i = 0; i < buffer.limit(); ++i)
buffer.put(i, (short)(buffer.get(i) * mVolume));
super.onAudioSamples(event);
}
Here, they modify the bytes in getAudioSamples()
by a constant of mVolume
.
Building on this approach, I've attempted a normalisation modifies the bytes in getAudioSamples()
to a normalised value, considering the max/min in the file. (See below for details). I have a simple filter to leave "silence" alone (ie., anything below a value).
I'm finding that the output file is very noisy (ie., the quality is seriously degraded). I assume that the error is either in my normalisation algorithim, or the way I manipulate the bytes. However, I'm unsure of where to go next.
Here's an abridged version of what I'm currently doing.
Reads the full audio file, and finds this highest and lowest values of buffer.get()
for all AudioSamples
@Override
public void onAudioSamples(IAudioSamplesEvent event) {
IAudioSamples audioSamples = event.getAudioSamples();
ShortBuffer buffer =
audioSamples.getByteBuffer().asShortBuffer();
short min = Short.MAX_VALUE;
short max = Short.MIN_VALUE;
for (int i = 0; i < buffer.limit(); ++i) {
short value = buffer.get(i);
min = (short) Math.min(min, value);
max = (short) Math.max(max, value);
}
// assign of min/max ommitted for brevity.
super.onAudioSamples(event);
}
In a loop similar to step1, replace the buffer with normalized values, calling:
buffer.put(i, normalize(buffer.get(i));
public short normalize(short value) {
if (isBackgroundNoise(value))
return value;
short rawMin = // min from step1
short rawMax = // max from step1
short targetRangeMin = 1000;
short targetRangeMax = 8000;
int abs = Math.abs(value);
double a = (abs - rawMin) * (targetRangeMax - targetRangeMin);
double b = (rawMax - rawMin);
double result = targetRangeMin + ( a/b );
// Copy the sign of value to result.
result = Math.copySign(result,value);
return (short) result;
}
normalize()
valid?Upvotes: 13
Views: 9311
Reputation: 3
I've tried this answer and it didn't work for me, however here's my version with double arrays. It works 100% for me and I find it much easier to understand. Something to keep in mind, you are referring to compression. The code I'm posting here is for normalizing, which was actually what I was searching for and found this post.
This code finds out what's the maximum amplitude in the waveform and finds out much to multiply the amplitude by in order to get the highest amplitude to 1.0.
double [] normalize (double [] samples)
{
double max = 0;
for (int i = 0; i < samples.length; i++)
{
if(Math.abs(samples[i]) > max)
{
max = Math.abs(samples[i]);
}
}
double difference = 1/max;
for(int i = 0;i< samples.length;i++)
{
samples[i] = samples[i] * difference;
}
return samples;
}
For limiting, which is closure to what you are referring to, you can do the following:
double [] limit (double [] samples, double db)
{
for (int i = 0; i < samples.length; i++)
{
if (samples[i] > db)
samples[i] = db;
else if (samples[i] < -db)
samples[i] = -db;
}
return samples;
}
This will prevent the audio from being too loud at certain points but can get harsh on certain material.
Compression is actually what you're looking for.
The difference between compression and limiting is that limiting put a cap on the volume for the entire waveform, whereas compression temporary puts a cap on the waveform. It has a release and attack time.
Upvotes: 0
Reputation: 11469
"normalization" of audio is the process of increasing the level of the audio such that the maximum is equal to some given value, usually the maximum possible value. Today, in another question, someone explained how to do this (see #1): audio volume normalization
However, you go on to say "Specifically, where an audio file contains peaks in volume, I'm trying to level it out, so the quiet sections are louder, and the peaks are quieter." This is called "compression" or "limiting" (not to be confused with the type of compression such as that used in encoding MP3s!). You can read more about that here: http://en.wikipedia.org/wiki/Dynamic_range_compression
A simple compressor is not particularly hard to implement, but you say your math "is embarrassingly weak." So you might want to find one that's already built. You might be able to find a compressor implemented in http://sox.sourceforge.net/ and convert that from C to Java. The only java implementation of compressor I know of who's source is available (and it's not very good) is in this book
As an alternative to solve your problem, you might be able to normalize your file in segments of say 1/2 a second each, and then connect the gain values you use for each segment using linear interpolation. You can read about linear interpolation for audio here: http://blog.bjornroche.com/2010/10/linear-interpolation-for-audio-in-c-c.html
I don't know if the source code is available for the levelator, but that's something else you can try.
Upvotes: 5
Reputation: 4165
I don't think the concept of "minimum sample value" is very meaningful, since the sample value just represents the current "height" of the sound wave at a certain time instant. I.e. its absolute value will vary between the peak value of the audio clip and zero. Thus, having a targetRangeMin
seems to be wrong and will probably cause some distortion of the waveform.
I think a better approach might be to have some sort of weight function that decreases the sample value based on its size. I.e. bigger values are decreased by a large percentage than smaller values. This would also introduce some distortion, but probably not very noticeable.
Edit: here is a sample implementation of such a method:
public short normalize(short value) {
short rawMax = // max from step1
short targetMax = 8000;
//This is the maximum volume reduction
double maxReduce = 1 - targetMax/(double)rawMax;
int abs = Math.abs(value);
double factor = (maxReduce * abs/(double)rawMax);
return (short) Math.round((1 - factor) * value);
}
For reference, this is what your algorithm did to a sine curve with an amplitude of 10000:
This explains why the audio quality becomes much worse after being normalized.
This is the result after running with my suggested normalize
method:
Upvotes: 10