Gerben van Loon
Gerben van Loon

Reputation: 33

Azure Cognitive Services Speech to Text large/long audio files sample

Like to transcribe a couple of long (Dutch) audio files. They are interviews which are about 60-120 minutes per file in length. Got only 8 files which I need to do manually, so not necessarily part of some automated software. Got some Azure credits, so thought to go with Azure Cognitive Services Speech to Text. Is there a sample somewhere for that?

Tried this sample: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-to-text-sample. Works fine. But stops right away after a small pause in the audio.

Saw a similar question here: Speech-to-text large audio files [Microsoft Speech API]. But the poster didn't share back how he solved it.

Can somebody help out?

Upvotes: 3

Views: 5172

Answers (2)

CandyColor
CandyColor

Reputation: 11

Here is a simple pyhton example to transcribe a large audio file to a txt. (It's not using batch processing so it takes a little. Hope it helps anyway.)

import time
import os
import azure.cognitiveservices.speech as speechsdk

def transcribe(key,region,lang,path_in,path_out="out.txt",newLine=False):
    speech_config = speechsdk.SpeechConfig(subscription=key, region=region)
    speech_config.speech_recognition_language=lang
    audio_config = speechsdk.audio.AudioConfig(filename=path_in)
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
    done = False

    textOut = ""
    def stop_cb(evt):
        print(evt)
        speech_recognizer.stop_continuous_recognition()
        nonlocal done
        done = True

    str_newLine = ""
    if newLine:
        str_newLine = " \n"

    def outPrint(evt):
        nonlocal textOut
        tmp_text = evt.result.text
        textOut += tmp_text + str_newLine
        print(tmp_text)

    speech_recognizer.recognized.connect(outPrint)
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.start_continuous_recognition()

    while not done:
        time.sleep(.5)
    with open(path_out, 'w') as f:
        f.write(textOut)

if __name__ == "__main__":
    key = "YOUR_KEY"
    region = "REGION_eg_westus"
    lang = "INPUT_LANGUAGE" # See e.g. https://learn.microsoft.com/en-us/dynamics365/fin-ops-core/dev-itpro/help/language-locale
    path_in = ""
    path_out = ""
    transcribe(key,region,lang,path_in,path_out)

Upvotes: 1

Ralf Beckers
Ralf Beckers

Reputation: 91

For longer audio files, we recommend the batch transcription APIs. A good explanation is here: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription and there are samples for C# and Python here: https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/batch.

Upvotes: 3

Related Questions