darthcrumpet
darthcrumpet

Reputation: 413

Python Voice Recognition Library - Always Listen?

I've recently been working on using a speech recognition library in python in order to launch applications. I Intend to ultimately use the library for voice activated home automation using the Raspberry Pi GPIO.

I have this working, it detects my voice and launches application. The problem is that it seems to hang on the one word I say (for example, I say internet and it launches chrome an infinite number of times)

This is unusual behavior from what I have seen of while loops. I cant figure out how to stop it looping. Do I need to do something out of the loop to make it work properly? Please see the code below.

http://pastebin.com/auquf1bR

import pyaudio,os
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
        audio = r.listen(source)

def excel():
        os.system("start excel.exe")

def internet():
        os.system("start chrome.exe")

def media():
        os.system("start wmplayer.exe")

def mainfunction():
        user = r.recognize(audio)
        print(user)
        if user == "Excel":
                excel()
        elif user == "Internet":
                internet()
        elif user == "music":
                media()
while 1:
        mainfunction()

Upvotes: 5

Views: 29245

Answers (4)

Muhammad Naufil
Muhammad Naufil

Reputation: 2637

That's sad but you have to initialise microphone in every loop and since, this module always have r.adjust_for_ambient_noise(source), which makes sure, that it understands your voice in noisy room too. Setting threshold takes time and can skip some of your words, if you are continuously giving commands

import pyaudio,os
import speech_recognition as sr
r = sr.Recognizer()


def excel():
        os.system("start excel.exe")

def internet():
        os.system("start chrome.exe")

def media():
        os.system("start wmplayer.exe")

def mainfunction():
        with sr.Microphone() as source:
            r.adjust_for_ambient_noise(source)
            audio = r.listen(source)
        user = r.recognize(audio)
        print(user)
        if user == "Excel":
                excel()
        elif user == "Internet":
                internet()
        elif user == "music":
                media()
while 1:
        mainfunction()

Upvotes: 0

Connor
Connor

Reputation: 698

I've spent a lot of time working on this subject.

Currently I'm developing a Python 3 open-source cross-platform virtual assistant program called Athena Voice: https://github.com/athena-voice/athena-voice-client

Users can use it much like Siri, Cortana, or Amazon Echo.

It also uses a very simple "module" system where users can easily write their own modules to enhance it's functionality. Let me know if that could be of use.

Otherwise, I recommend looking into Pocketsphinx and Google's Python speech-to-text/text-to-speech packages.

On Python 3.4, Pocketsphinx can be installed with:

pip install pocketsphinx

However, you must install the PyAudio dependency separately (unofficial download): http://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudio

Both google packages can be installed by using the command:

pip install SpeechRecognition gTTS

Google STT: https://pypi.python.org/pypi/SpeechRecognition/

Google TTS: https://pypi.python.org/pypi/gTTS/1.0.2

Pocketsphinx should be used for offline wake-up-word recognition, and Google STT should be used for active listening.

Upvotes: 1

Nikolay Shmyrev
Nikolay Shmyrev

Reputation: 25210

Just in case, here is the example on how to listen continuously for keyword in pocketsphinx, this is going to be way easier than to send audio to google continuously. And you could have way more flexible solution.

import sys, os, pyaudio
from pocketsphinx import *

modeldir = "/usr/local/share/pocketsphinx/model"
# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-hmm', os.path.join(modeldir, 'hmm/en_US/hub4wsj_sc_8k'))
config.set_string('-dict', os.path.join(modeldir, 'lm/en_US/cmu07a.dic'))
config.set_string('-keyphrase', 'oh mighty computer')
config.set_float('-kws_threshold', 1e-40)

decoder = Decoder(config)
decoder.start_utt('spotting')

stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
stream.start_stream()        

while True:
    buf = stream.read(1024)
    decoder.process_raw(buf, False, False)
    if decoder.hyp() != None and decoder.hyp().hypstr == 'oh mighty computer':
        print "Detected keyword, restarting search"
        decoder.end_utt()
        decoder.start_utt('spotting')

Upvotes: 9

dano
dano

Reputation: 94871

The problem is that you only actually listen for speech once at the beginning of the program, and then just repeatedly call recognize on the same bit of saved audio. Move the code that actually listens for speech into the while loop:

import pyaudio,os
import speech_recognition as sr


def excel():
        os.system("start excel.exe")

def internet():
        os.system("start chrome.exe")

def media():
        os.system("start wmplayer.exe")

def mainfunction(source):
    audio = r.listen(source)
    user = r.recognize(audio)
    print(user)
    if user == "Excel":
        excel()
    elif user == "Internet":
        internet()
    elif user == "music":
        media()

if __name__ == "__main__":
    r = sr.Recognizer()
    with sr.Microphone() as source:
        while 1:
            mainfunction(source)

Upvotes: 9

Related Questions