Realtime audio application, improving performance

Question

I am currently writing a C++ real time audio application which roughly contains:

reading frames from a buffer
interpolating frames with the hermit interpolation here
filtering ever frame with two biquad filters (and updating their coefficients every frame)
a 3 band crossover containing 18 biquad calculations
a FreeVerb algorithm from the STK libary here

I think this should be handable for my PC but I get some buffer underflows every so often so I would like to improve the performance of my application. I have a bunch of question I hope you can answer me. :)

1) Operator Overloading

Instead of working directly with my flaot samples and doing calculations for every sample, I pack my floats in a Frame class which contains the left and the right Sample. The class overloads some operators for addition, subtraction and multiplication with float.

The filters (biquad mostly) and the reverb works with floats and doesn't use this class but the hermite interpolator and every multiplication and addition for volume controll and mixing uses the class.

Does this has an impact on the performance and would it be better to work with left and right sample directly?

2) std::function

The callback function from the audio IO libary PortAudio calls a std::function. I use this to encapsulation everything related to PortAudio. So the "user" sets his own callback function with std::bind

std::bind(  &AudioController::processAudio, 
            &(*this), 
            std::placeholders::_1, 
            std::placeholders::_2));

Since for every callback, the right function has to be found from the CPU (however this works...), does this have an impact and would it be better to define a class the user has to inherit from?

3) virtual functions

I use a class called AudioProcessor which declares a virtual function:

virtual void tick(Frame *buffer, int frameCout) = 0;

This function always processes a number of frames at once. Depending on the drive, 200 frames up to 1000 frames per call. Within the signal processing path, I call this function 6 time from multiple derivated classes. I remember that this is done with lookup tables so the CPU knows exactly which function it has to call. So does the process of calling a "virtual" (derivated) function has an impact on the performance?

The nice thing about this is the structure in the source code but only using inlines maybe would have an performance improvement.

These are all questions for now. I have some more about Qt's event loop because I think that my GUI uses quite a bit of CPU time as well. But this is another topic I guess. :)

Thanks in advance!

These are all relevant function calls within the signal processing. Some of them are from the STK libary. The biquad functions are from STK and should perform fine. This goes for the freeverb algorithm as well.

// ################################ AudioController Function ############################
void AudioController::processAudio(int frameCount, float *output) {
    // CALCULATE LEFT TRACK

    Frame * leftFrameBuffer = (Frame*) output;

    if(leftLoaded) { // the left processor is loaded
        leftProcessor->tick(leftFrameBuffer, frameCount);   //(TrackProcessor::tick()
    } else {
        for(int i = 0; i < frameCount; i++) {
            leftFrameBuffer[i].leftSample  = 0.0f;
            leftFrameBuffer[i].rightSample = 0.0f;
        }
    }

    // CALCULATE RIGHT TRACk

    if(rightLoaded) { // the right processor is loaded
        // the rightFrameBuffer is allocated once and ensured to have enough space for frameCount Frames
        rightProcessor->tick(rightFrameBuffer, frameCount); //(TrackProcessor::tick()
    } else {
        for(int i = 0; i < frameCount; i++) {
            rightFrameBuffer[i].leftSample  = 0.0f;
            rightFrameBuffer[i].rightSample = 0.0f;
        }
    }

    // MIX
    for(int i = 0; i < frameCount; i++ ) {
        leftFrameBuffer[i] = volume * (leftRightMix * leftFrameBuffer[i] + (1.0 - leftRightMix) * rightFrameBuffer[i]);
    }
}

// ################################ AudioController Function ############################

void TrackProcessor::tick(Frame *frames, int frameNum) {
    if(bufferLoaded && playback) {
        for(int i = 0; i < frameNum; i++) {
            // read from buffer
            frames[i] =  bufferPlayer->tick();

            // filter coeffs
            caltulateFilterCoeffs(lowCutoffFilter->tick(), highCutoffFilter->tick());

            // filter
            frames[i].leftSample = lpFilterL->tick(hpFilterL->tick(frames[i].leftSample));
            frames[i].rightSample = lpFilterR->tick(hpFilterR->tick(frames[i].rightSample));
        }
    } else {
        for(int i = 0; i < frameNum; i++) {         
            frames[i] = Frame(0,0);
        }
    }

    // Effect 1, Equalizer
    if(effsActive[0]) {
        insEffProcessors[0]->tick(frames, frameNum);
    }
    // Effect 2, Reverb
    if(effsActive[1]) {
        insEffProcessors[1]->tick(frames, frameNum);
    }

    // Volume
    for(int i = 0; i < frameNum; i++) {
        frames[i].leftSample  *= volume;
        frames[i].rightSample *= volume;
    }
}

// ################################ Equalizer ############################

void EqualizerProcessor::tick(Frame *frames, int frameNum) {
    if(active) {
        Frame lowCross;
        Frame highCross;

        for(int f = 0; f < frameNum; f++) {

            lowAmp = lowAmpFilter->tick();
            midAmp = midAmpFilter->tick();
            highAmp = highAmpFilter->tick();

            lowCross =  highLPF->tick(frames[f]);
            highCross = highHPF->tick(frames[f]);

            frames[f] = lowAmp * lowLPF->tick(lowCross) 
                      + midAmp * lowHPF->tick(lowCross) 
                      + highAmp * lowAPF->tick(highCross);
        }
    }
}

// ################################ Reverb ############################
// This function just calls the stk::FreeVerb tick function for every frame
// The FreeVerb implementation can't realy be optimised so I will take it as it is.

void ReverbProcessor::tick(Frame *frames, int frameNum) {
    if(active) {
        for(int i = 0; i < frameNum; i++) {
            frames[i].leftSample = reverb->tick(frames[i].leftSample, frames[i].rightSample);
            frames[i].rightSample = reverb->lastOut(1);
        }
    }
}

// ################################ Buffer Playback (BufferPlayer) ############################

Frame BufferPlayer::tick() {
    // adjust read position based on loop status
    if(inLoop) {
        while(readPos > loopEndPos) {
            readPos = loopStartPos + (readPos - loopEndPos); 
        }
    }

    int x1  = readPos;
    float t = readPos - x1;

    Frame f = interpolate(buffer->frameAt(x1-1), 
                          buffer->frameAt(x1),
                          buffer->frameAt(x1+1),
                          buffer->frameAt(x1+2),
                          t);

    readPos += stepSize;;
    return f;
}

// interpolation:
Frame BufferPlayer::interpolate(Frame x0, Frame x1, Frame x2, Frame x3, float t) {
    Frame c0 = x1;
    Frame c1 = 0.5f * (x2 - x0);
    Frame c2 = x0 - (2.5f * x1) + (2.0f * x2) - (0.5f * x3);
    Frame c3 = (0.5f * (x3 - x0)) + (1.5f * (x1 - x2));
    return (((((c3 * t) + c2) * t) + c1) * t) + c0;
}


inline Frame BufferPlayer::frameAt(int pos) {
    if(pos < 0) {
        pos = 0;
    } else if (pos >= frames) {
        pos = frames -1;
    }

    // get chunk and relative Sample
    int chunk = pos/ChunkSize;
    int chunkSample = pos%ChunkSize;

    return Frame(leftChunks[chunk][chunkSample], rightChunks[chunk][chunkSample]); 
}

Realtime audio application, improving performance

Answers (1)

Optimize Data Cache Usage

Use Multiple Threads

Use the GPU processing

Reducing the Branching

Reduce the Amount of Processing

Coding to Help the Compiler Optimize

Related Questions