Reputation: 1533
I am working on a product that need to take inputs from user and do certain actions based on it. We have implemented it with a chat box via typing and it is serving our purpose. For the future releases we want to add voice recognition to the chat window. We thought of using
window.speechRecognition() || window.webkitSpeechRecognition()
but we came to know that the functionally available in browsers use Google's Cloud Speech API. As we deal with very sensitive information of users this will be security issue. Is there any other alternatives for implementing the speech recognition that works in any browsers.
Upvotes: 10
Views: 15763
Reputation: 5591
There isn't a great answer to this, but your best bet for offline speech recognition at the moment (Aug, 2023) is using an implementation of OpenAI's Whisper model, compiled to WebAssembly. There are three that I know of:
Note this still isn't a great option for a few reasons:
Part of the added complexity is because these types of models generally work in chunks. For longer audio files, and especially for real-time transcription, we want a continuous stream of audio to produce a continuous stream of output text. The Web Speech Recognition API handles that for you, while with Whisper you have to do the chunking yourself (and deal with things like window overlaps or corrected transcriptions of already-seen words).
There is a good description of some of these issues with using lower-level speech recognition model APIs in the README of Google's Open Source Live Transcribe Speech Engine.[1]
All that to say, it would be really nice if we could just use the Web Speech Recognition API itself, with an offline browser-native model, but I haven't seen any recent movement in that direction.[2][3] If you can accept the limitations, Whisper might be a workable alternative (and if you want to make a Web Speech API polyfill, I'm sure it would be very much appreciated!)
[1]: In the announcement post for that library, Google recognized the complications in using an online system. Unfortunately, despite the name, this project isn't actually what I'd really call a "Live Transcribe Speech Engine", but instead a library to do live transcription using Google's cloud transcription API.
[2]: in fact, Chrome does ship a library to do offline transcription called libSODA (Speech On-Device), but it was initially released for the Live Caption feature, and seems to still not be used for the user-facing voice-to-text. Not so surprisingly, "the Speech team was concerned about unauthorized repurposing of their components", so I'd guess general availability for speech to text usage is something we can expect in the near future.
[3]: At one point Mozilla was building a speech to text engine called DeepSpeech to embed in Firefox, but apparently dropped development. Some former members of the DeepSpeech team forked the project and continued to the work for a while as Coqui AI STT, but have since retired that effort and recommend using Whisper instead.
Upvotes: 6
Reputation: 53
use tensorflowjs "tfjs" model the most sensible solution which works in the browser
Speech Command Recognizer The Speech Command Recognizer is a JavaScript module that enables recognition of spoken commands comprised of simple isolated English words from a small vocabulary
Upvotes: 4
Reputation:
Apparently PocketSphinx.js
is the only available way to go as of now. It's an open-source speech-to-text engine that supports English but not many languages beyond that.
Github:
However, if you want to run your code on a single instance of an Android device (e.g. a device displayed somewhere in a public area), you can use "Download offline voice recognition language" in mobile Chrome's settings. There is no such option for the desktop browser.
Upvotes: 3
Reputation: 7938
You can try:
Upvotes: 4
Reputation: 795
You can try IBM Watson's Speech To Text service. It can be used from any browser and you can opt-out so user's data is not logged server-side: https://console.bluemix.net/docs/services/watson/getting-started-logging.html#controlling-request-logging-for-watson-services
The demo of the service is here: https://speech-to-text-demo.ng.bluemix.net/
It works at least in Firefox and Chrome, it is based in the following open source SDK: https://github.com/watson-developer-cloud/speech-javascript-sdk
ps. For the general case, when user's data is not sensitive, it is better not to opt-out so Watson can leverage the data to improve service quality.
Upvotes: 0