Tobias Philipp
Tobias Philipp

Reputation: 93

Detect multiple voices without speech recognition

Is there a way to just detect in realtime if there are multiple people speaking? Do I need a voice recognition api for that?

I don't want to separate the audio and I don't want to transcribe it either. My approach would be to frequently record using one mic (-> mono) and then analyse those recordings. But how then would I detect und distinguish voices? I'd narrow it down by looking only at relevant frequencies, but then...

I do understand that this is no trivial undertaking. That's why I do hope there's an api out there capable of doing this out of the box - preferably an mobile/web-friendly api.

Now this might sound like a shopping list for Christmas but as mentioned I do not need to know anything about the content. So my guess is that a full fledged speech recognition would have a high toll on the performance.

Upvotes: 2

Views: 2830

Answers (1)

Nikolay Shmyrev
Nikolay Shmyrev

Reputation: 25220

Most of similar problems (adult/children classifier, speech/music classifier, single voice / voice mixture classifier) are standard machine learning problems. You can solve them with classifier like GMM. You only need to construct training data for your task, so:

  1. Take some amount of clean recordings, you can download audiobook
  2. Prepare mixed data by mixing clean recordings
  3. Train GMM classifier on both
  4. Compare probabilities from clean speech GMM and mixed speech GMM and decide the presence of mixture by ratio of probabilities from two classifiers.

You can find some code samples here:

https://github.com/littleowen/Conceptor

For example you can try

https://github.com/littleowen/Conceptor/blob/master/Gender.ipynb

Upvotes: 2

Related Questions