Reputation: 1329
I am a .NET Core developer and I have recently been asked to transcribe mp3 audio files that are approximately 20 minutes long into text. Thus, the file is about 30.5mb. The issue is that speech is sparse in this file, varying anywhere between 2 minutes between a spoken sentence or 4 minutes of length.
I've written a small service based on the google speech documentation that sends 32kb of streaming data to be processed from the file at a time. All was progressing well until I hit this error that I share below as follows:
I have searched via google-fu, google forums, and other sources and I have not encountered documentation on this error. Suffice it to say, I think this is due to the sparsity of spoken words in my file? I am wondering if there is a programmatical centric workaround?
I have used some code that is a slight modification of the google .net sample for 32kb streaming. You can find it here.
public async void Run()
{
var speech = SpeechClient.Create();
var streamingCall = speech.StreamingRecognize();
// Write the initial request with the config.
await streamingCall.WriteAsync(
new StreamingRecognizeRequest()
{
StreamingConfig = new StreamingRecognitionConfig()
{
Config = new RecognitionConfig()
{
Encoding =
RecognitionConfig.Types.AudioEncoding.Flac,
SampleRateHertz = 22050,
LanguageCode = "en",
},
InterimResults = true,
}
});
// Helper Function: Print responses as they arrive.
Task printResponses = Task.Run(async () =>
{
while (await streamingCall.ResponseStream.MoveNext(
default(CancellationToken)))
{
foreach (var result in streamingCall.ResponseStream.Current.Results)
{
//foreach (var alternative in result.Alternatives)
//{
// Console.WriteLine(alternative.Transcript);
//}
if(result.IsFinal)
{
Console.WriteLine(result.Alternatives.ToString());
}
}
}
});
string filePath = "mono_1.flac";
using (FileStream fileStream = new FileStream(filePath, FileMode.Open))
{
//var buffer = new byte[32 * 1024];
var buffer = new byte[64 * 1024]; //Trying 64kb buffer
int bytesRead;
while ((bytesRead = await fileStream.ReadAsync(
buffer, 0, buffer.Length)) > 0)
{
await streamingCall.WriteAsync(
new StreamingRecognizeRequest()
{
AudioContent = Google.Protobuf.ByteString
.CopyFrom(buffer, 0, bytesRead),
});
await Task.Delay(500);
};
}
await streamingCall.WriteCompleteAsync();
await printResponses;
}//End of Run
I've increased the stream to 64kb of streaming data to be processed and then I received the following error as can be seen below:
Which, I believe, means the actual api timed out. Which is decidely a step in the wrong direction. Has anybody encountered a problem such as mine with the Google Speech Api when dealing with a audio file with sparse speech? Is there a method in which I can filter the audio down to only spoken words progamatically and then process that? I'm open to suggestions, but my research and attempts have only lead me to further breaking my code.
Upvotes: 2
Views: 526
Reputation: 1598
There is to way for recognize audio in Google Speech API:
Your sample is uses the normal recognize, which has a limit for 15 minutes. Try to use the long recognize method:
{
var speech = SpeechClient.Create();
var longOperation = speech.LongRunningRecognize( new RecognitionConfig()
{
Encoding = RecognitionConfig.Types.AudioEncoding.Linear16,
SampleRateHertz = 16000,
LanguageCode = "hu",
}, RecognitionAudio.FromFile( filePath ) );
longOperation = longOperation.PollUntilCompleted();
var response = longOperation.Result;
foreach ( var result in response.Results )
{
foreach ( var alternative in result.Alternatives )
{
Console.WriteLine( alternative.Transcript );
}
}
return 0;
}
I hope it helps for you.
Upvotes: 0