Uzair_Abdullah
Uzair_Abdullah

Reputation: 151

Speech-to-text Recognition is not accurate

I am trying to implement Speech-to-text recognition in my React website, and I am using the react-speech-recognition package from npm. I am using the exact code they have specified in the package description over here: npm
Now it works with everyday speech, anything I say, but when I induce technical jargon, it goes way off!

Here's what I am trying to say to it, it's aviation jargon:

Cleared to enter the CTR, not above 1500 feet, join and report on a right downwind runway 19, QNH 1018, squak 2732

This is what I get in response:

please to enter the city are not above 15 feet heart penetrate join and report on a ride on the wind blown away 9 theme

What else do I need to do to fix the accuracy of the recognition?

Upvotes: 2

Views: 1337

Answers (2)

Jayanth MKV
Jayanth MKV

Reputation: 323

Here are some tricks that can help:

For Accurate and continuous Results:

const {
transcript,
finalTranscript,
listening,
resetTranscript,
browserSupportsSpeechRecognition } = useSpeechRecognition({
lang: "en-IN", // Set the language to Indian English
interimResults: true, // Get partial results
continuous: true, // Enable continuous recognition
maxAlternatives: 5, // Set the number of alternative transcriptions
});
  • SpeechRecognition.startListening({ continuous: true, language: 'en-IN' }); // Set the language to Indian English
    
  • The react-speech-recognition library allows you to specify the language and dialect for the speech recognition engine. It's essential to select the appropriate language and dialect that matches the user's speech patterns. For example, if your application is targeting users with Indian English accents, you should set the language to 'en-IN' (English - India) to optimize the recognition accuracy.


For Lengthy Conversations:

const {
transcript,
finalTranscript,
listening,
resetTranscript,
browserSupportsSpeechRecognition } = useSpeechRecognition({
lang: "en-IN", // Set the language to Indian English
interimResults: true, // Get partial results
continuous: true, // Enable continuous recognition
maxAlternatives: 5, // Set the number of alternative transcriptions
abortController: new AbortController(), // Create a new AbortController instance});


const [abortController, setAbortController] = useState(new 
AbortController());

const handleStopRecording = () => {
    stopRecording();
    setAbortController(new AbortController()); // Create a new AbortController instance};

const stopRecording = () => {
    setRecordingStatus("inactive");
    abortController.abort(); // Abort the speech recognition
    SpeechRecognition.stopListening();
// rest of the logic
}

lengthy and speedy conversations

  • Memoize startRecording and stopRecording Functions using useCallback hook moved the creation of the abortController instance to the handleStartRecording and handleStopRecording functions. This ensures that a new abortController is created every time.

    const stopRecording = useCallback(() => {...,[abortController]}
    const startRecording = useCallback(() => {...,[abortController]}
    

const handleStartRecording = () => {
        startRecording();
        setAbortController(new AbortController()); // Create a new AbortController instance
      };

  const handleStopRecording = () => {
    stopRecording();
    setAbortController(new AbortController()); // Create a new AbortController instance
  };

Upvotes: 0

StriplingWarrior
StriplingWarrior

Reputation: 156624

That package leverages the Speech Recognition Interface of your browser's Web Speech API. The React Library's API allows you to get the underlying SpeechRecognition object via a call to the getRecognition() method.

The underlying SpeechRecognition object's API allows for the addition of Grammars using the JSpeech Grammar Format. Here's an example. So in theory, you could provide more information about the words you're expecting to hear in your app, and thereby improve performance.

But there are caveats, including:

  • There is very limited browser support for the speech recognition generally, and for the addition of grammars, specifically. Obviously if you don't have control over what browser your users will be using, that means the quality of recognition will vary, and might not work at all if you don't use Polyfills.
  • Depending on how the speech recognition is implemented, things like hardware configuration and the Operating System may impact speech recognition results.
  • Speech recognition is an extremely inexact science. The best automatic speech recognition software/services only boast about 85% accuracy, even with ordinary speech. The ones built into your browser probably won't be even that good.

You may be able to get better accuracy from cloud-based speech services. Azure Cognitive Services, for example, allows you to create custom voice models, custom grammars, etc. Of course, they also charge you based on usage, and they charge more if you're using customizations.

Upvotes: 3

Related Questions