Reputation: 49
I've been experimenting with the python speech recognition library https://pypi.python.org/pypi/SpeechRecognition/
To read downloaded versions of the BBC shipping forecast. The clipping of those files from live radio to the iplayer are obviously automated and not very accurate - so usually there is some audio before the forecast itself starts - a trailer, or the end of the news. I don't need to be that accurate but I'd like to get speech recognition to recognise the phrase "and now the shipping forecast" (or just 'shipping' would do actually) and cut the file from there.
My code so far (adpated form an example) transcribes and audio file of the forecast and uses a formula (based on 200 words per minute) to predict where the word shipping comes, but it's not proving to be very accurate.
Is there a way of getting the actual 'frame' or second onset that pocketsphinx itself detected for that word? I can't find anything in the documentation. Anyone any ideas?
import speech_recognition as sr
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "test_short2.wav")
# use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source) # read the entire audio file
# recognize speech using Sphinx
try:
print "Sphinx thinks you said "
returnedSpeech = str(r.recognize_sphinx(audio))
wordsList = returnedSpeech.split()
print returnedSpeech
print "predicted loacation of start ", float(wordsList.index("shipping")) * 0.3
except sr.UnknownValueError:
print("Sphinx could not understand audio")
except sr.RequestError as e:
print("Sphinx error; {0}".format(e))
Upvotes: 1
Views: 3169
Reputation: 25210
You need to use pocketsphinx API directly for such things. It is also highly recommended to read pocketsphinx documentation on keyword spotting.
You can spot for keyphrase as demonstrated in example:
config = Decoder.default_config()
config.set_string('-hmm', os.path.join(modeldir, 'en-us/en-us'))
config.set_string('-dict', os.path.join(modeldir, 'en-us/cmudict-en-us.dict'))
config.set_string('-keyphrase', 'shipping forecast')
config.set_float('-kws_threshold', 1e-30)
stream = open(os.path.join(datadir, "test_short2.wav"), "rb")
decoder = Decoder(config)
decoder.start_utt()
while True:
buf = stream.read(1024)
if buf:
decoder.process_raw(buf, False, False)
else:
break
if decoder.hyp() != None:
print ([(seg.word, seg.prob, seg.start_frame, seg.end_frame) for seg in decoder.seg()])
print ("Detected keyphrase, restarting search")
decoder.end_utt()
decoder.start_utt()
Upvotes: 1