Andrew van Renen
Andrew van Renen

Reputation: 1

Azure Conversation Transcription: userid coming out as $ref$

I am using the Microsoft speech SDK (and their sample code) to transcribe a multi-participant conversation. The transcription works fine, but it is returning $ref$ instead of the userid for the people with provided signatures and Unidentified for the people without signatures.

I am not using a Roobo but a sound file I prepared with Audacity to be eight channels of 16-bit 16 kHz PCM audio. The transcription does work, so I assume the sound file is not the issue. It seems like the service is recognising the voices tied to the signature files correctly (for instance, it switches from $ref$ to Unknown at the right point in the text) but it seems unable to access the speaker name (userid in the model).

Unfortunately, I can't find any C# code online to refer to other than the provided Microsoft sample (https://learn.microsoft.com/bs-latn-ba/azure/cognitive-services/speech-service/how-to-use-conversation-transcription-service).

I see there is a post with a similar question (but no answers) here: Azure Speech To Text: Conversation Transcribing userid always return $ref$

Has anyone attempted this and go it working?

Upvotes: 0

Views: 344

Answers (1)

stlik
stlik

Reputation: 21

Seems like audio is not in the right format. Should be 16bit,16kHZ, 8 channels (Stereo Left=1, Stereo Right=2, Mono=3, Mono=4, Mono=5, Mono=6 ,Mono=7, Silenced Mono=8).

Here you can find enrollment_audio_steve.wav, enrollment_audio_katie.wav and conversation katiesteve.wav. It's in a correct format. However it doesn't allow to create signature from enrollment_audio_katie.wav. So it work with Steve.

It still seems that's it's only work with SpeechSDK devices. But i was able to recrod own audio, based on that format.

Upvotes: 0

Related Questions