How to do audio expansion/normalization (emphasise difference between high and low)

Question

I'm trying to find way to emphasise the difference between high and low points in the audio. I can't seem to find documentation on how to do this - perhaps this can be done with ffmpeg. Would really appreciate some pointers from someone who knows more about signal processing.

kibibu · Accepted Answer

Fundamentally, an expander is the opposite of a compressor *; you may have more luck finding documentation about how to implement those. They also have a lot in common with a noise gate.

The expander

The basic approach is to implement an envelope follower, and use the value of the envelope to scale the audio source. An envelope follower tries to track the amplitude of the audio signal.

A basic pythonic pseudocode framework looks a bit like this:

envelope_follower e          # some envelope follower, we'll replace this

for each sample x:
   amplitude = e.get(x)      # use the envelope follower to estimate the amplitude of e
   x = expand(x, amplitude)  # apply some expansion operation

At it's most basic, the expand operation looks like this (assuming your samples are between -1.0 and 1.0):

def expand(x, amplitude):
    return x * amplitude

There are more sophisticated approaches, for example clamping and scaling amplitude so it never drops below 0.5, or applying some non-linear function to amplitude before multiplying.

# just an example
def expand(x, amplitude):
    return x * clamp(1.2 * amplitude - 0.2 * (amplitude * amplitude), 0.3, 1.0)

Envelope followers

The quality of the compressor/expander depends almost entirely on how you implement the envelope follower. It's not an exact science, as a very accurate envelope follower might cause some nasty audible effects in certain situations - there are trade-offs to be made.

As with all of these things, there are a lot of approaches! Here's a couple:

Filtered rectifier

One of the simplest approaches - particularly if you already have a library of signal processing blocks - is the lowpass-filtered rectifier.

It works like this:

class envelope_follower:
    lowpassfilter filter;

    def get(x):
        return filter.process( abs(x) )

The controls you get here are basically around your filter design, and lowpass cutoff. Using a simple leaky-accumulator filter will get you a long way.

Attack-Release follower

People usually want more control over their expander, and it's sometimes difficult to think about the actual effects of the filtered rectifier - tweaking one parameter might change a lot of its behaviour.

It's often very desirable for real-world signals for compressors/expanders to respond very quickly (for example, to a piano or drum impact), and be slow to release (so the tail of the piano note isn't suddenly cut off)

An Attack-Release follower gives more precise control by specifying a number of parameters:

Two thresholds
- A threshold, above which sounds should be made louder
- A threshold, below which sounds should be made quieter (not necessarily the same threshold, but can be!)
Two time periods:
- How long it takes to reach full loudness when the first threshold is crossed (This is the Attack parameter)
- How long it takes to reach quietness (Release)

One basic approach to implementing one of these is:

class envelope_follower:
    # Parameters required. These are just made up
    attack_threshold = 0.6
    release_threshold = 0.3
    attack_time = 10       # in samples
    release_time = 1000    # in samples

    amp = 0.0

    def get(x):
        # we still work with the absolute value.
        # You might use another measure of amplitude here like RMS
        # or even the filtered rectifier above
        a = abs(x)

        if x > attack_threshold:
          amp += (1.0 / attack_time)
        else if x < release_threshold:
          amp -= (1.0 / release_time)

        amp = clamp(amp, 0.0, 1.0)

        return amp

One common extension to this type of follower is to add a Hold parameter, that specifies a minimum length of time that the expander should be wide open. This avoids the envelope creating an audible triangle or sawtooth wave on lower-frequency signals.

An even more sophisticated approach is to do a full Attack-Decay-Sustain-Release, which lets you control the transients and is commonly used as a drum treatment.

Getting wild

From here, you can:

Create a smoother expand function
Fairly trivially adjust the above into a Compander - a combination device that quietens low sounds, but also quietens overly-loud sounds;
Split a signal into multiple frequency bands, and compress/expand each one separately. This is commonly done do get real in-your-face maximum amplitude during music mastering;
Adjust Attack/Hold/Release based on the spectral content of the sound you're expanding. Very short attack/release times are fine for high-frequency signals, but sound awful for low-frequency signals;
Add a mild saturating distortion for sounds over the threshold; this can make things perceptually louder, even if the signal still has same the maximum amplitude. You ideally want a saturator that doesn't affect signals under the threshold at all.

Good luck!

* not to be confused with MP3-style compression. A compressor squishes the dynamic range.