Microsoft Speech to Text Python SDK SPXERR_INVALID_HEADER issue

I'm getting the following error when using the Microsoft Python Speech-to-Text Quickstart ("Quickstart: Recognize speech from an audio file") with the azure-cognitiveservices-speech v1.8.0 SDK.

RuntimeError: Exception with an error code: 0xa (SPXERR_INVALID_HEADER)

There are just 3 inputs to this file:

Azure Subscription Key
Azure Service Region
Filename

I'm using the following test MP3 file:

https://github.com/grokify/go-transcribe/blob/master/examples/mongodb-is-web-scale/web-scale_b2F-DItXtZs.mp3

Here's the full output:

Traceback (most recent call last):
  File "main.py", line 16, in <module>
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/azure/cognitiveservices/speech/speech.py", line 761, in __init__
    self._impl = self._get_impl(impl.SpeechRecognizer, speech_config, audio_config)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/azure/cognitiveservices/speech/speech.py", line 547, in _get_impl
    _impl = reco_type._from_config(speech_config._impl, audio_config._impl)
RuntimeError: Exception with an error code: 0xa (SPXERR_INVALID_HEADER)
[CALL STACK BEGIN]

3   libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106ad88d2 CreateModuleObject + 1136482
4   libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106ad7f4f CreateModuleObject + 1134047
5   libMicrosoft.CognitiveServices.Speech.core.dylib 0x00000001069d1803 CreateModuleObject + 59027
6   libMicrosoft.CognitiveServices.Speech.core.dylib 0x00000001069d1503 CreateModuleObject + 58259
7   libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a11c64 CreateModuleObject + 322292
8   libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a10be5 CreateModuleObject + 318069
9   libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a0e5a2 CreateModuleObject + 308274
10  libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a0e7c3 CreateModuleObject + 308819
11  libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106960bc7 recognizer_create_speech_recognizer_from_config + 3863
12  libMicrosoft.CognitiveServices.Speech.core.dylib 0x000000010695fd74 recognizer_create_speech_recognizer_from_config + 196
13  _speech_py_impl.so                  0x00000001067ff35b PyInit__speech_py_impl + 814939
14  _speech_py_impl.so                  0x000000010679b530 PyInit__speech_py_impl + 405808
15  Python                              0x00000001060f65dc _PyMethodDef_RawFastCallKeywords + 668
16  Python                              0x00000001060f5a5a _PyCFunction_FastCallKeywords + 42
17  Python                              0x00000001061b45a4 call_function + 724
18  Python                              0x00000001061b1576 _PyEval_EvalFrameDefault + 25190
19  Python                              0x00000001060f5e90 function_code_fastcall + 128
20  Python                              0x00000001061b45b2 call_function + 738
21  Python                              0x00000001061b1576 _PyEval_EvalFrameDefault + 25190
22  Python                              0x00000001061b50d6 _PyEval_EvalCodeWithName + 2422
23  Python                              0x00000001060f55fb _PyFunction_FastCallDict + 523
24  Python                              0x00000001060f68cf _PyObject_Call_Prepend + 143
25  Python                              0x0000000106144d51 slot_tp_init + 145
26  Python                              0x00000001061406a9 type_call + 297
27  Python                              0x00000001060f5871 _PyObject_FastCallKeywords + 433
28  Python                              0x00000001061b4474 call_function + 420
29  Python                              0x00000001061b16bd _PyEval_EvalFrameDefault + 25517
30  Python                              0x00000001061b50d6 _PyEval_EvalCodeWithName + 2422
31  Python                              0x00000001061ab234 PyEval_EvalCode + 100
32  Python                              0x00000001061e88f1 PyRun_FileExFlags + 209
33  Python                              0x00000001061e816a PyRun_SimpleFileExFlags + 890
34  Python                              0x00000001062079db pymain_main + 6875
35  Python                              0x0000000106207f2a _Py_UnixMain + 58
36  libdyld.dylib                       0x00007fff5d8aaed9 start + 1
37  ???                                 0x0000000000000002 0x0 + 2

Can anyone provide some pointers on what header this is referring to and how to resolve this.

Upvotes: 5

Answers (3)

ati ince

Reputation: 151

I guess there is no official method for usage of SDK with different formats (mp3 or different framerate) I'd like to use the Azure method that is able to use any type of audio file input

Until now I am using my made-up method for dealing with this problem, first convert the proper file and delete it after finish my job. The original file is preserving:

For python:

fname_buf = fname
fname = self.AudioFileAdjust(fname,'test-it') 

# Do somethings

if fname_buf != fname:
self.AudioFileAdjust(fname,'remove')

Subfunction AudioFileAdjust (I am using pydub and pyaudio):

def AudioFileAdjust(self,fname,states=''):
    '''
    check audio file format and if not appropriate create new buffer audio for use
    '''
    if states == 'remove':
        os.remove(fname)
    else:
        # if the file format not useful for Azure, first need to change -> fr: 16000 must be
        audio_file = au.ReadAudioFile(fname)
        if audio_file.frame_rate != int(16000):
            #print('[Commend] changing the FrameRate')
            audio_file_e = au.SetFramerate(audio_file,int(16000))
            #change fine name for use
            fname2 = fname.split(".")[0] + "_Conv_2" + ".wav"  #without wav firstly and add additional 
            au.ExportAudioFile(audio_file_e,fname2)
            #print('new file name: ', fname)
            fname = fname2
    return fname

Upvotes: 0

Thilina Sandunsiri

Reputation: 590

The default audio streaming format is WAV (16kHz or 8kHz, 16-bit, and mono PCM). Outside of WAV / PCM, the compressed input formats listed below are also supported.

However if you use C#/Java/C++/Objective C and if you want to use compressed audio formats such as .mp3, you can handle it by using GStreamer

For more information follow this Microsoft documentation.

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-codec-compressed-audio-input-streams

Upvotes: 5

chlandsi-msft

Reputation: 106

mp3-encoded audio is not supported as an input format. Please use a WAV(PCM) file with 16-bit samples, 16 kHz sample rate, and a single channel (Mono).

Upvotes: 9

Microsoft Speech to Text Python SDK SPXERR_INVALID_HEADER issue

Answers (3)

Related Questions