Reputation: 21
I've been working on implementing a system for real-time audio capture and analysis within an existing music software project. The goal of this system is to begin capturing audio when the user presses the record button (or after a specified count-in period), determine the notes the user sings or plays, and notate these notes on a musical staff. The gist of my method is to use one thread to capture chunks of audio data and put them into a queue, and another thread to remove the data from the queue and perform the analysis.
This scheme works well, but I am having trouble quantifying the latency between the onset of audio capture and playback of the MIDI backing instruments. Audio capture begins before the MIDI instruments begin playing back, and the user is presumably going to be synchronizing his or her performance with the MIDI instruments. Therefore, I need to ignore audio data captured before the backing MIDI instruments begin playing and only analyze audio data collected after that point.
Playback of the backing tracks is handled by a body of code that has been in place for quite a while and maintained by someone else, so I would like to avoid refactoring the whole program if possible. Audio capture is controlled with a Timer object and a class that extends TimerTask, instances of which are created in a lumbering (~25k lines) class called Notate. Notate also keeps tabs on the objects that handle playback of the backing tracks, by the way. The Timer’s .scheduleAtFixedRate() method is used to control periods of audio capture, and the TimerTask notifies the capture thread to begin by calling .notify() on the queue (ArrayBlockingQueue).
My strategy for calculating the time gap between the initialization of these two processes has been to subtract the timestamp taken just before capture begins (in milliseconds) from the timestamp taken at the moment playback begins, which I'm defining this as when the .start() method is called on the Java Sequencer object that is in charge of the MIDI backing tracks. I then use the result to determine the number of audio samples that I expect to have been captured during this interval (n) and ignore the first n * 2 bytes in the array of captured audio data (n * 2 because I am capturing 16-bit samples, whereas the data is stored as a byte array… 2 bytes per sample).
However, this method is not giving me accurate results. The calculated offset is always less than I expect it to be, such that there remains a non-trivial (and unfortunately varied) amount of “empty” space in the audio data after beginning analysis at the designated position. This causes the program to attempt to analyze audio data collected when the user had not yet begun to play along with the backing MIDI instruments, effectively adding rests - the absence of musical notes - at the begging of the user’s musical passage and ruining the rhythm values calculated for all subsequent notes.
Below is the code for my audio capture thread, which also determines the latency and corresponding position offset for the array of captured audio data. Can anyone offer insight into why my method for determining latency is not working correctly?
public class CaptureThread extends Thread
{
public void run()
{
//number of bytes to capture before putting data in the queue.
//determined via the sample rate, tempo, and # of "beats" in 1 "measure"
int bytesToCapture = (int) ((SAMPLE_RATE * 2.) / (score.getTempo()
/ score.getMetre()[0] / 60.));
//temporary buffer - will be added to ByteArrayOutputStream upon filling.
byte tempBuffer[] = new byte[target.getBufferSize() / 5];
int limit = (int) (bytesToCapture / tempBuffer.length);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(bytesToCapture);
int bytesRead;
try
{ //Loop until stopCapture is set.
while (!stopCapture)
{ //first, wait for notification from TimerTask
synchronized (thisCapture)
{
thisCapture.wait();
}
if (!processingStarted)
{ //the time at which audio capture begins
startTime = System.currentTimeMillis();
}
//start the TargetDataLine, from which audio data is read
target.start();
//collect 1 captureInterval's worth of data
for (int n = 0; n < limit; n++)
{
bytesRead = target.read(tempBuffer, 0, tempBuffer.length);
if (bytesRead > 0)
{ //Append data to output stream.
outputStream.write(tempBuffer, 0, bytesRead);
}
}
if (!processingStarted)
{
long difference = (midiSynth.getPlaybackStartTime()
+ score.getCountInTime() * 1000 - startTime);
positionOffset = (int) ((difference / 1000.)
* SAMPLE_RATE * 2.);
if (positionOffset % 2 != 0)
{ //1 sample = 2 bytes, so positionOffset must be even
positionOffset += 1;
}
}
if (outputStream.size() > 0)
{ //package data collected in the output stream into a byte array
byte[] capturedAudioData = outputStream.toByteArray();
//add captured data to the queue for processing
processingQueue.add(capturedAudioData);
synchronized (processingQueue)
{
try
{ //notify the analysis thread that data is in the queue
processingQueue.notify();
} catch (Exception e)
{
//handle the error
}
}
outputStream.reset(); //reset the output stream
}
}
} catch (Exception e)
{
//handle error
}
}
}
I am looking into using a Mixer object to synchronize the TargetDataLine which is accepting data from the microphone and the Line that handles playback from the MIDI instruments. Now to find the Line that handles playback... Any ideas?
Upvotes: 2
Views: 670
Reputation: 496
Google has a good open source app called AudioBufferSize that you are probably familiar with. I modified this app the test one way latency- that is to say, the time between when a user presses a button and the sound is played by the Audio API. Here is the code I added to AudioBufferSize to achieve this. Could you use such an approach to provide the timing delta between the event and when the user perceives it?
final Button latencyButton = (Button) findViewById(R.id.latencyButton);
latencyButton.setOnClickListener(new OnClickListener() {
public void onClick(View v) {
mLatencyStartTime = getCurrentTime();
latencyButton.setEnabled(false);
// Do the latency calculation, play a 440 hz sound for 250 msec
AudioTrack sound = generateTone(440, 250);
sound.setNotificationMarkerPosition(count /2); // Listen for the end of the sample
sound.setPlaybackPositionUpdateListener(new OnPlaybackPositionUpdateListener() {
public void onPeriodicNotification(AudioTrack sound) { }
public void onMarkerReached(AudioTrack sound) {
// The sound has finished playing, so record the time
mLatencyStopTime = getCurrentTime();
diff = mLatencyStopTime - mLatencyStartTime;
// Update the latency result
TextView lat = (TextView)findViewById(R.id.latency);
lat.setText(diff + " ms");
latencyButton.setEnabled(true);
logUI("Latency test result= " + diff + " ms");
}
});
sound.play();
}
});
There is a reference to generateTone which looks likes this:
private AudioTrack generateTone(double freqHz, int durationMs) {
int count = (int)(44100.0 * 2.0 * (durationMs / 1000.0)) & ~1;
short[] samples = new short[count];
for(int i = 0; i < count; i += 2){
short sample = (short)(Math.sin(2 * Math.PI * i / (44100.0 / freqHz)) * 0x7FFF);
samples[i + 0] = sample;
samples[i + 1] = sample;
}
AudioTrack track = new AudioTrack(AudioManager.STREAM_MUSIC, 44100,
AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT,
count * (Short.SIZE / 8), AudioTrack.MODE_STATIC);
track.write(samples, 0, count);
return track;
}
Just realized, this question is multi-years old. Sorry, maybe someone will find it useful.
Upvotes: 1