Real-time speaker recognition with Microsoft Cognitive

Question

I'm trying to build an application that solves the problem of speaker diarization by using the Microsoft Cognitive Speaker Recognition APIs.

Looking at the sample project and reading the APIs documentation, i understood that the recognition should be done sending a wav file to the service, which goes against my goal of doing it real time.

Has someone done some research on that? Is it feasible using those APIs or i should look for another road?

Stan · Accepted Answer

There is no stream approach, just like Google has with Speech API. To enroll new profile no need to have 30 seconds. In my recent practice - I had successful results for ~10 seconds. The core issue with MS API - is restrictions with multiple speakers. You have to find your own way how to divide them into separate audio tracks. Otherwise it will recognize the very first known voice.

Real-time speaker recognition with Microsoft Cognitive

Answers (2)

Related Questions