Reputation: 822
I am writing an application which should receive audio and send it to Bing Recognition API to get text. I used the Service Library and it works with a wav file. So I wrote my own stream class to receive audio from mic or network (RTP) as send it to the recognition API. When I add a WAV header in front of the audio stream, it works for some seconds.
Debugging shows, that the recognition api reads form stream faster than it is filled by audio source (16k samplerate, 16 bit, mono).
So my question is: Is there a way to use the recognize api with a real-time (continuous) audio stream?
I know there is an example with a microphone client, but it works with microphone only and I need it for different sources.
Upvotes: 4
Views: 1526
Reputation: 121
Adding additional supporting information on this topic: The stream implementation has to support concurrent read/write operations, and block when it has no data.
Upvotes: 0
Reputation: 3137
If you want to use sources other than a microphone, you can use a DataRecognitionClient
class, by calling SpeechRecognitionServiceFactory
's CreateDataClient
method. Once you have the client object, you can take audio from any source--microphone, network, reading from a file, etc.--and send it to be processed with the client's SendAudio
method. As you receive each audio buffer, you make a new call to SendAudio
.
While you're in the process of sending audio with SendAudio
, you will receive partial recognition results in realtime (or close) in the form of the client's OnPartialResponse
event.
When you're done sending audio, you signal to the client that you're ready for the final recognition result by calling EndAudio
. You should then receive a OnResponseReceived
event from the client containing the final recognition hypotheses.
Upvotes: 1
Reputation: 822
I found a solution for my problem. I wrote a class AudioStream
inherited from stream which buffers the input and wait when the Read method is called and its buffer is empty. This prevents the recognizer to stop because the read method return always a value > 0.
Here is the important part code of this class:
public class AudioStream : Stream {
private AutoResetEvent _waitEvent = new AutoResetEvent(false);
internal void AddData(byte[] buffer, int count) {
_buffer.Add(buffer, count);
// Enable Read
_waitEvent.Set();
}
public override int Read(byte[] buffer, int offset, int count) {
int readCount = 0;
if ((_buffer.Empty) {
// Wait for input
_waitEvent.WaitOne();
}
......
// Fill buffer from _buffer;
_waitEvent.Reset();
return length;
}
protected override void Dispose(bool disposing) {
// Make sure, that there is no waiting Read
// Clear buffer, dispose wait event etc.
}
......
}
Because audio data is received continously, the Read method will not "hang" longer than some miliseconds (e.g. RTP packages are received all 20 ms).
Upvotes: 2