Jacob Stern
Jacob Stern

Reputation: 4597

25s Latency in Google Speech to Text

This is a problem I ran into using the Google Speech to Text Engine. I am currently streaming 16 bit / 16 kHz audio real time in 32kB chunks. But there is an average 25 second latency between sending audio and receiving transcripts, defeating the purpose of real time transcription.

Why is there such high latency?

Upvotes: 3

Views: 2837

Answers (1)

Jacob Stern
Jacob Stern

Reputation: 4597

The Google Speech to Text documentation recommends using a 100 ms frame size to minimize latency.

32kB * (8 bits / 1 byte) * ( 1 sample / 16 bits ) * (1 sec / 16000 samples ) = 1 sec.

So try sending 3.2kB chunks instead. That dropped average latency from 25s to ~4s.

Upvotes: 7

Related Questions