Reputation: 1
I am using the Microsoft speech SDK (and their sample code) to transcribe a multi-participant conversation. The transcription works fine, but it is returning $ref$
instead of the userid for the people with provided signatures and Unidentified
for the people without signatures.
I am not using a Roobo but a sound file I prepared with Audacity to be eight channels of 16-bit 16 kHz PCM audio. The transcription does work, so I assume the sound file is not the issue. It seems like the service is recognising the voices tied to the signature files correctly (for instance, it switches from $ref$ to Unknown at the right point in the text) but it seems unable to access the speaker name (userid in the model).
Unfortunately, I can't find any C# code online to refer to other than the provided Microsoft sample (https://learn.microsoft.com/bs-latn-ba/azure/cognitive-services/speech-service/how-to-use-conversation-transcription-service).
I see there is a post with a similar question (but no answers) here: Azure Speech To Text: Conversation Transcribing userid always return $ref$
Has anyone attempted this and go it working?
Upvotes: 0
Views: 344
Reputation: 21
Seems like audio is not in the right format. Should be 16bit,16kHZ, 8 channels (Stereo Left=1, Stereo Right=2, Mono=3, Mono=4, Mono=5, Mono=6 ,Mono=7, Silenced Mono=8).
Here you can find enrollment_audio_steve.wav, enrollment_audio_katie.wav and conversation katiesteve.wav. It's in a correct format. However it doesn't allow to create signature from enrollment_audio_katie.wav. So it work with Steve.
It still seems that's it's only work with SpeechSDK devices. But i was able to recrod own audio, based on that format.
Upvotes: 0