Determining Latency in Audio Processing

Question

I've been working on implementing a system for real-time audio capture and analysis within an existing music software project. The goal of this system is to begin capturing audio when the user presses the record button (or after a specified count-in period), determine the notes the user sings or plays, and notate these notes on a musical staff. The gist of my method is to use one thread to capture chunks of audio data and put them into a queue, and another thread to remove the data from the queue and perform the analysis.

This scheme works well, but I am having trouble quantifying the latency between the onset of audio capture and playback of the MIDI backing instruments. Audio capture begins before the MIDI instruments begin playing back, and the user is presumably going to be synchronizing his or her performance with the MIDI instruments. Therefore, I need to ignore audio data captured before the backing MIDI instruments begin playing and only analyze audio data collected after that point.

Playback of the backing tracks is handled by a body of code that has been in place for quite a while and maintained by someone else, so I would like to avoid refactoring the whole program if possible. Audio capture is controlled with a Timer object and a class that extends TimerTask, instances of which are created in a lumbering (~25k lines) class called Notate. Notate also keeps tabs on the objects that handle playback of the backing tracks, by the way. The Timer’s .scheduleAtFixedRate() method is used to control periods of audio capture, and the TimerTask notifies the capture thread to begin by calling .notify() on the queue (ArrayBlockingQueue).

My strategy for calculating the time gap between the initialization of these two processes has been to subtract the timestamp taken just before capture begins (in milliseconds) from the timestamp taken at the moment playback begins, which I'm defining this as when the .start() method is called on the Java Sequencer object that is in charge of the MIDI backing tracks. I then use the result to determine the number of audio samples that I expect to have been captured during this interval (n) and ignore the first n * 2 bytes in the array of captured audio data (n * 2 because I am capturing 16-bit samples, whereas the data is stored as a byte array… 2 bytes per sample).

However, this method is not giving me accurate results. The calculated offset is always less than I expect it to be, such that there remains a non-trivial (and unfortunately varied) amount of “empty” space in the audio data after beginning analysis at the designated position. This causes the program to attempt to analyze audio data collected when the user had not yet begun to play along with the backing MIDI instruments, effectively adding rests - the absence of musical notes - at the begging of the user’s musical passage and ruining the rhythm values calculated for all subsequent notes.

Below is the code for my audio capture thread, which also determines the latency and corresponding position offset for the array of captured audio data. Can anyone offer insight into why my method for determining latency is not working correctly?

public class CaptureThread extends Thread
{
    public void run()
    {
        //number of bytes to capture before putting data in the queue.
    //determined via the sample rate, tempo, and # of "beats" in 1 "measure"
        int bytesToCapture = (int) ((SAMPLE_RATE * 2.) / (score.getTempo()
                / score.getMetre()[0] / 60.));
    //temporary buffer - will be added to ByteArrayOutputStream upon filling.
        byte tempBuffer[] = new byte[target.getBufferSize() / 5];

        int limit = (int) (bytesToCapture / tempBuffer.length);

        ByteArrayOutputStream outputStream = new ByteArrayOutputStream(bytesToCapture);
        int bytesRead;

        try
        { //Loop until stopCapture is set.
            while (!stopCapture)
            { //first, wait for notification from TimerTask
                synchronized (thisCapture)
                {
                    thisCapture.wait();
                }

                if (!processingStarted)
                { //the time at which audio capture begins
                    startTime = System.currentTimeMillis();
                }

                //start the TargetDataLine, from which audio data is read
                target.start();

                //collect 1 captureInterval's worth of data
                for (int n = 0; n < limit; n++)
                {
                    bytesRead = target.read(tempBuffer, 0, tempBuffer.length);
                    if (bytesRead > 0)
                    {   //Append data to output stream.
                        outputStream.write(tempBuffer, 0, bytesRead);
                    }
                }

                if (!processingStarted)
                {
                    long difference = (midiSynth.getPlaybackStartTime()
                            + score.getCountInTime() * 1000 - startTime);

                    positionOffset = (int) ((difference / 1000.)
                            * SAMPLE_RATE * 2.);

                    if (positionOffset % 2 != 0)
                    { //1 sample = 2 bytes, so positionOffset must be even
                        positionOffset += 1;
                    }
                }
                if (outputStream.size() > 0)
                {   //package data collected in the output stream into a byte array
                    byte[] capturedAudioData = outputStream.toByteArray();
                    //add captured data to the queue for processing
                    processingQueue.add(capturedAudioData);

                    synchronized (processingQueue)
                    {
                        try
                        { //notify the analysis thread that data is in the queue
                            processingQueue.notify();
                        } catch (Exception e)
                        {
                            //handle the error
                        }
                    }

                    outputStream.reset(); //reset the output stream
                }
            }
        } catch (Exception e)
        {
            //handle error
        }
    }
}

I am looking into using a Mixer object to synchronize the TargetDataLine which is accepting data from the microphone and the Line that handles playback from the MIDI instruments. Now to find the Line that handles playback... Any ideas?

Determining Latency in Audio Processing

Answers (1)

Related Questions