Is Speech-to-Text voice training data sampled at 48kHz still good for improving recognition of 16kHz speech

Question

We are training our Azure Cognitive Services Custom Speech model using data recorded in .wav (RIFF) format at 16bit, 16kHz as per the documentation.

But, we have obtained a dataset of speech recorded at 48kHz and encoded as MP3. Speech Studio seems to be able to train the service using this data without problems but we would like to know if doing so, with the higher sample rate, will only be of use in recognising streamed data also at the higher rate or does that not matter?

Is Speech-to-Text voice training data sampled at 48kHz still good for improving recognition of 16kHz speech

Answers (1)

Related Questions