Reputation: 413
I've recently been working on using a speech recognition library in python in order to launch applications. I Intend to ultimately use the library for voice activated home automation using the Raspberry Pi GPIO.
I have this working, it detects my voice and launches application. The problem is that it seems to hang on the one word I say (for example, I say internet and it launches chrome an infinite number of times)
This is unusual behavior from what I have seen of while loops. I cant figure out how to stop it looping. Do I need to do something out of the loop to make it work properly? Please see the code below.
import pyaudio,os
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
audio = r.listen(source)
def excel():
os.system("start excel.exe")
def internet():
os.system("start chrome.exe")
def media():
os.system("start wmplayer.exe")
def mainfunction():
user = r.recognize(audio)
print(user)
if user == "Excel":
excel()
elif user == "Internet":
internet()
elif user == "music":
media()
while 1:
mainfunction()
Upvotes: 5
Views: 29245
Reputation: 2637
That's sad but you have to initialise microphone in every loop and since, this module always have r.adjust_for_ambient_noise(source)
, which makes sure, that it understands your voice in noisy room too. Setting threshold takes time and can skip some of your words, if you are continuously giving commands
import pyaudio,os
import speech_recognition as sr
r = sr.Recognizer()
def excel():
os.system("start excel.exe")
def internet():
os.system("start chrome.exe")
def media():
os.system("start wmplayer.exe")
def mainfunction():
with sr.Microphone() as source:
r.adjust_for_ambient_noise(source)
audio = r.listen(source)
user = r.recognize(audio)
print(user)
if user == "Excel":
excel()
elif user == "Internet":
internet()
elif user == "music":
media()
while 1:
mainfunction()
Upvotes: 0
Reputation: 698
I've spent a lot of time working on this subject.
Currently I'm developing a Python 3 open-source cross-platform virtual assistant program called Athena Voice: https://github.com/athena-voice/athena-voice-client
Users can use it much like Siri, Cortana, or Amazon Echo.
It also uses a very simple "module" system where users can easily write their own modules to enhance it's functionality. Let me know if that could be of use.
Otherwise, I recommend looking into Pocketsphinx and Google's Python speech-to-text/text-to-speech packages.
On Python 3.4, Pocketsphinx can be installed with:
pip install pocketsphinx
However, you must install the PyAudio dependency separately (unofficial download): http://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudio
Both google packages can be installed by using the command:
pip install SpeechRecognition gTTS
Google STT: https://pypi.python.org/pypi/SpeechRecognition/
Google TTS: https://pypi.python.org/pypi/gTTS/1.0.2
Pocketsphinx should be used for offline wake-up-word recognition, and Google STT should be used for active listening.
Upvotes: 1
Reputation: 25210
Just in case, here is the example on how to listen continuously for keyword in pocketsphinx, this is going to be way easier than to send audio to google continuously. And you could have way more flexible solution.
import sys, os, pyaudio
from pocketsphinx import *
modeldir = "/usr/local/share/pocketsphinx/model"
# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-hmm', os.path.join(modeldir, 'hmm/en_US/hub4wsj_sc_8k'))
config.set_string('-dict', os.path.join(modeldir, 'lm/en_US/cmu07a.dic'))
config.set_string('-keyphrase', 'oh mighty computer')
config.set_float('-kws_threshold', 1e-40)
decoder = Decoder(config)
decoder.start_utt('spotting')
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
stream.start_stream()
while True:
buf = stream.read(1024)
decoder.process_raw(buf, False, False)
if decoder.hyp() != None and decoder.hyp().hypstr == 'oh mighty computer':
print "Detected keyword, restarting search"
decoder.end_utt()
decoder.start_utt('spotting')
Upvotes: 9
Reputation: 94871
The problem is that you only actually listen for speech once at the beginning of the program, and then just repeatedly call recognize
on the same bit of saved audio. Move the code that actually listens for speech into the while
loop:
import pyaudio,os
import speech_recognition as sr
def excel():
os.system("start excel.exe")
def internet():
os.system("start chrome.exe")
def media():
os.system("start wmplayer.exe")
def mainfunction(source):
audio = r.listen(source)
user = r.recognize(audio)
print(user)
if user == "Excel":
excel()
elif user == "Internet":
internet()
elif user == "music":
media()
if __name__ == "__main__":
r = sr.Recognizer()
with sr.Microphone() as source:
while 1:
mainfunction(source)
Upvotes: 9