ar9
ar9

Reputation: 11

Use WebRTC Voice Activity Detector (VAD) on live audio coming from VOIP streamed by Websockets

I'm trying to set up WebRTC Voice Activity Detector (VAD) for a VOIP call that is streaming through a websocket, to detect when the caller has stopped talking.

Most of the tutorials and questions around WebRTC VAD were based on recorded audio files, and not on a live stream. I would like to know how to implement it on a websocket streaming a VOIP call in real time.

According to the py-webrtcvad documentation (https://pypi.org/project/webrtcvad/):

Give it a short segment (“frame”) of audio. The WebRTC VAD only accepts 16-bit mono PCM audio, sampled at 8000, 16000, or 32000 Hz. A frame must be either 10, 20, or 30 ms in duration

How would one go about converting a raw audio stream coming from the websocket to the requirements needed above by the WebRTC VAD to work?

Upvotes: 1

Views: 2856

Answers (1)

baobui
baobui

Reputation: 21

To use VAD, you need chunk size correctly. https://github.com/wiseman/py-webrtcvad/issues/30

- For example, if your sample rate is 16000 Hz, then the only allowed 
frame/chunk sizes are: 
    16000 * ({10,20,30} / 1000) = 160, 320 or 480 samples.
- Since each sample is 2 bytes (16 bits), the only allowed frame/chunk sizes are 
320, 640, or 960 bytes.

Upvotes: 2

Related Questions