Reputation: 16354
I'm getting the following error when using the Microsoft Python Speech-to-Text Quickstart ("Quickstart: Recognize speech from an audio file") with the azure-cognitiveservices-speech v1.8.0 SDK.
RuntimeError: Exception with an error code: 0xa (SPXERR_INVALID_HEADER)
There are just 3 inputs to this file:
I'm using the following test MP3 file:
Here's the full output:
Traceback (most recent call last):
File "main.py", line 16, in <module>
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/azure/cognitiveservices/speech/speech.py", line 761, in __init__
self._impl = self._get_impl(impl.SpeechRecognizer, speech_config, audio_config)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/azure/cognitiveservices/speech/speech.py", line 547, in _get_impl
_impl = reco_type._from_config(speech_config._impl, audio_config._impl)
RuntimeError: Exception with an error code: 0xa (SPXERR_INVALID_HEADER)
[CALL STACK BEGIN]
3 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106ad88d2 CreateModuleObject + 1136482
4 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106ad7f4f CreateModuleObject + 1134047
5 libMicrosoft.CognitiveServices.Speech.core.dylib 0x00000001069d1803 CreateModuleObject + 59027
6 libMicrosoft.CognitiveServices.Speech.core.dylib 0x00000001069d1503 CreateModuleObject + 58259
7 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a11c64 CreateModuleObject + 322292
8 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a10be5 CreateModuleObject + 318069
9 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a0e5a2 CreateModuleObject + 308274
10 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106a0e7c3 CreateModuleObject + 308819
11 libMicrosoft.CognitiveServices.Speech.core.dylib 0x0000000106960bc7 recognizer_create_speech_recognizer_from_config + 3863
12 libMicrosoft.CognitiveServices.Speech.core.dylib 0x000000010695fd74 recognizer_create_speech_recognizer_from_config + 196
13 _speech_py_impl.so 0x00000001067ff35b PyInit__speech_py_impl + 814939
14 _speech_py_impl.so 0x000000010679b530 PyInit__speech_py_impl + 405808
15 Python 0x00000001060f65dc _PyMethodDef_RawFastCallKeywords + 668
16 Python 0x00000001060f5a5a _PyCFunction_FastCallKeywords + 42
17 Python 0x00000001061b45a4 call_function + 724
18 Python 0x00000001061b1576 _PyEval_EvalFrameDefault + 25190
19 Python 0x00000001060f5e90 function_code_fastcall + 128
20 Python 0x00000001061b45b2 call_function + 738
21 Python 0x00000001061b1576 _PyEval_EvalFrameDefault + 25190
22 Python 0x00000001061b50d6 _PyEval_EvalCodeWithName + 2422
23 Python 0x00000001060f55fb _PyFunction_FastCallDict + 523
24 Python 0x00000001060f68cf _PyObject_Call_Prepend + 143
25 Python 0x0000000106144d51 slot_tp_init + 145
26 Python 0x00000001061406a9 type_call + 297
27 Python 0x00000001060f5871 _PyObject_FastCallKeywords + 433
28 Python 0x00000001061b4474 call_function + 420
29 Python 0x00000001061b16bd _PyEval_EvalFrameDefault + 25517
30 Python 0x00000001061b50d6 _PyEval_EvalCodeWithName + 2422
31 Python 0x00000001061ab234 PyEval_EvalCode + 100
32 Python 0x00000001061e88f1 PyRun_FileExFlags + 209
33 Python 0x00000001061e816a PyRun_SimpleFileExFlags + 890
34 Python 0x00000001062079db pymain_main + 6875
35 Python 0x0000000106207f2a _Py_UnixMain + 58
36 libdyld.dylib 0x00007fff5d8aaed9 start + 1
37 ??? 0x0000000000000002 0x0 + 2
Can anyone provide some pointers on what header this is referring to and how to resolve this.
Upvotes: 5
Views: 8447
Reputation: 151
I guess there is no official method for usage of SDK with different formats (mp3 or different framerate) I'd like to use the Azure method that is able to use any type of audio file input
Until now I am using my made-up method for dealing with this problem, first convert the proper file and delete it after finish my job. The original file is preserving:
For python:
fname_buf = fname
fname = self.AudioFileAdjust(fname,'test-it')
# Do somethings
if fname_buf != fname:
self.AudioFileAdjust(fname,'remove')
Subfunction AudioFileAdjust (I am using pydub and pyaudio):
def AudioFileAdjust(self,fname,states=''):
'''
check audio file format and if not appropriate create new buffer audio for use
'''
if states == 'remove':
os.remove(fname)
else:
# if the file format not useful for Azure, first need to change -> fr: 16000 must be
audio_file = au.ReadAudioFile(fname)
if audio_file.frame_rate != int(16000):
#print('[Commend] changing the FrameRate')
audio_file_e = au.SetFramerate(audio_file,int(16000))
#change fine name for use
fname2 = fname.split(".")[0] + "_Conv_2" + ".wav" #without wav firstly and add additional
au.ExportAudioFile(audio_file_e,fname2)
#print('new file name: ', fname)
fname = fname2
return fname
Upvotes: 0
Reputation: 590
The default audio streaming format is WAV (16kHz or 8kHz, 16-bit, and mono PCM). Outside of WAV / PCM, the compressed input formats listed below are also supported.
However if you use C#/Java/C++/Objective C and if you want to use compressed audio formats such as .mp3, you can handle it by using GStreamer
For more information follow this Microsoft documentation.
Upvotes: 5
Reputation: 106
mp3-encoded audio is not supported as an input format. Please use a WAV(PCM) file with 16-bit samples, 16 kHz sample rate, and a single channel (Mono).
Upvotes: 9