Ingrid
Ingrid

Reputation: 526

UnicodeDecodeError from sound file

I'm trying to make a speech recogniser in Python using Google speech API. I've been using and adapting the code from here (converted to Python3). I'm using an audio file on my computer that's been converted from mp3 to flac 16000 Hz (as specified in the original code) using an online converter. When running the code I get this error:

$ python3 speech_api.py 02-29-2016_00-12_msg1.flac 
Traceback (most recent call last):
  File "speech_api.py", line 12, in <module>
    data = f.read()
  File "/usr/lib/python3.4/codecs.py", line 319, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 9: invalid start byte

This is my code. (I'm sure there are also still things that don't work in Python3, as I've been trying to adapt it and am new to urllib...)

#!/usr/bin/python
import sys
from urllib.request import urlopen
import json
try:
    filename = sys.argv[1]
except IndexError:
    print('Usage: transcribe.py <file>')
    sys.exit(1)

with open(filename) as f:
    data = f.read()

req = urllib.request('https://www.google.com/intl/en/chrome/demos/speech.html', data=data, headers={'Content-type': 'audio/x-flac; rate=16000'})

try:
    ret = urllib.urlopen(req)
except urllib.URLError:
    print("Error Transcribing Voicemail")
    sys.exit(1)

resp = ret.read()
text = json.loads(resp)['hypotheses'][0]['utterance']
print(text)

Any ideas what I could do?

Upvotes: 2

Views: 2520

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121486

You need to open the file in binary mode:

open(filename, 'wb')

Note the 'b', or the file will be treated as text and decoded to Unicode.

Upvotes: 5

Related Questions