Does anyone know a library, program, project, etc. that tries to determine how many speakers were active in an audio file, label each speaker, label its gender, etc.? So far I found the following: Identifying segments when a person is speaking? Audio analysis to detect human voice, gender, age and emotion — any prior open-source work done?

Reputation: 8640

Audio analysis for voice, gender diarization/recognition

Does anyone know a library, program, project, etc. that tries to determine how many speakers were active in an audio file, label each speaker, label its gender, etc.?

So far I found the following:

Upvotes: 2

Answers (1)

roy

Reputation: 11

The task of identifying how many people are there and assigning segments to speakers in an audio file is known as speaker diarization. Using this keyword for search you can find lots of research papers and some libraries in python. Most of the current research use deep learning models, typically RNN, to generate embeddings and then cluster them into different chunks, ideally which belong to different speakers. It is a difficult task, especially if your files are noisy. I didn't find any library/tool which was very accurate. Even IBM's API is not that accurate.

We have developed some Deep learning models on our own for this task which are exposed through API's. You can take a look at https://developers.deepaffects.com/ for more info. We also have gender and emotion recognition API's.

Disclosure - I work at deepaffects

Upvotes: 1

Audio analysis for voice, gender diarization/recognition

Answers (1)

Related Questions