Reputation: 8640
Does anyone know a library, program, project, etc. that tries to determine how many speakers were active in an audio file, label each speaker, label its gender, etc.?
So far I found the following:
Upvotes: 2
Views: 890
Reputation: 11
The task of identifying how many people are there and assigning segments to speakers in an audio file is known as speaker diarization. Using this keyword for search you can find lots of research papers and some libraries in python. Most of the current research use deep learning models, typically RNN, to generate embeddings and then cluster them into different chunks, ideally which belong to different speakers. It is a difficult task, especially if your files are noisy. I didn't find any library/tool which was very accurate. Even IBM's API is not that accurate.
We have developed some Deep learning models on our own for this task which are exposed through API's. You can take a look at https://developers.deepaffects.com/ for more info. We also have gender and emotion recognition API's.
Disclosure - I work at deepaffects
Upvotes: 1