How to train custom speech model in Microsoft cognitive services Speech to text

Question

I'm doing a POC with Speech to text. I need to recognize specific words like "D-STUM" (daily stand up meeting). The problem is, every time I tell my program to recognize "D-STUM", i get "Destiny", "This theme", etc.

I already went on speech.microsoft.com/.../customspeech, and I've recorded around 40 wav files of people saying "D-STUM". I've also created a file named "trans.txt" which contains every wav file with the word "D-STUM" after each file. Like this : D_stum_1.wav D-STUM D_stum_2.wav D-STUM D_stum_3.wav D-STUM D_stum_4.wav D-STUM ...

Then I uploaded a zip containing the wav files and the trans.txt file, train a model with those datas, and created an endpoint. I referenced this endpoint on my soft, and launched it.

I expect my custom speech-to-text to recognize people saying "D-STUM" and displaying "D-STUM" as text. I never had "D-STUM" displayed after customizing the model.

Did I do something wrong? Is it the right way to do a custom training? Is 40 samples not enough for the model to be properly trained?

Thank you for your answers.

How to train custom speech model in Microsoft cognitive services Speech to text

Answers (1)

Related Questions