Reputation: 33
Like to transcribe a couple of long (Dutch) audio files. They are interviews which are about 60-120 minutes per file in length. Got only 8 files which I need to do manually, so not necessarily part of some automated software. Got some Azure credits, so thought to go with Azure Cognitive Services Speech to Text. Is there a sample somewhere for that?
Tried this sample: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-to-text-sample. Works fine. But stops right away after a small pause in the audio.
Saw a similar question here: Speech-to-text large audio files [Microsoft Speech API]. But the poster didn't share back how he solved it.
Can somebody help out?
Upvotes: 3
Views: 5172
Reputation: 11
Here is a simple pyhton example to transcribe a large audio file to a txt. (It's not using batch processing so it takes a little. Hope it helps anyway.)
import time
import os
import azure.cognitiveservices.speech as speechsdk
def transcribe(key,region,lang,path_in,path_out="out.txt",newLine=False):
speech_config = speechsdk.SpeechConfig(subscription=key, region=region)
speech_config.speech_recognition_language=lang
audio_config = speechsdk.audio.AudioConfig(filename=path_in)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
done = False
textOut = ""
def stop_cb(evt):
print(evt)
speech_recognizer.stop_continuous_recognition()
nonlocal done
done = True
str_newLine = ""
if newLine:
str_newLine = " \n"
def outPrint(evt):
nonlocal textOut
tmp_text = evt.result.text
textOut += tmp_text + str_newLine
print(tmp_text)
speech_recognizer.recognized.connect(outPrint)
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.start_continuous_recognition()
while not done:
time.sleep(.5)
with open(path_out, 'w') as f:
f.write(textOut)
if __name__ == "__main__":
key = "YOUR_KEY"
region = "REGION_eg_westus"
lang = "INPUT_LANGUAGE" # See e.g. https://learn.microsoft.com/en-us/dynamics365/fin-ops-core/dev-itpro/help/language-locale
path_in = ""
path_out = ""
transcribe(key,region,lang,path_in,path_out)
Upvotes: 1
Reputation: 91
For longer audio files, we recommend the batch transcription APIs. A good explanation is here: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription and there are samples for C# and Python here: https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/batch.
Upvotes: 3