Reputation: 1358
I'm trying to get my speech recognition script working but it can't understand me.
import pyaudio
import speech_recognition as sr
def initSpeech():
r = sr.Recognizer()
with sr.Microphone() as source:
r.adjust_for_ambient_noise(source, duration=2)
print("Set minimum energy threshold to {}".format(r.energy_threshold))
print("Say something")
audio = r.listen(source, phrase_time_limit=10)
command = ""
try:
command = r.recognize_google(audio)
except:
print("Coundn't understand you!")
print(command)
initSpeech()
This is my code to recognize my voice but it always prints out "Coundn't understand you!"
when I record my voice using python with the following script and put the wave file as input for the speech recognition it works fine:
import pyaudio
import wave
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("* recording")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("* done recording")
stream.stop_stream()
stream.close()
p.terminate()
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
This script to record my voice and then using this file "output.wav" as input for the speech recognition.
EDIT:
With,
with open("microphone-results.wav", "wb") as f:
f.write(audio.get_wav_data())
I recorded my voice which will be analyzed. And it sounded really bad, low and slow like in bad movies with an voice changer. Maybe this is a hint for the solution. I already checked the settings of chuck_size and sample_rate these are identical with the settings in my recording script above. My system: Windows 10
There is also an issue on github github issue 358
Python: 3.6
Thank you for your help!
Upvotes: 9
Views: 5981
Reputation: 13666
Your audio is obviously not recorded properly, and this leads to recognition failure. My guess is that r.adjust_for_ambient_noise
is failing you (automatic speech/silence detectors are not simple to implement). Start with removing this line and manually set
r.energy_threshold = 50
r.dynamic_energy_threshold = False
After that, save the recorded audio into .WAV file and listen. You have to get your audio clear before you send it to ASR engine.
Also, I recommend you to make sure that you are using the microphone you intended to use
print(Microphone.list_microphone_names()[0])
Upvotes: 10