brightonBreezy
brightonBreezy

Reputation: 23

Is Speech-to-Text voice training data sampled at 48kHz still good for improving recognition of 16kHz speech

We are training our Azure Cognitive Services Custom Speech model using data recorded in .wav (RIFF) format at 16bit, 16kHz as per the documentation.

But, we have obtained a dataset of speech recorded at 48kHz and encoded as MP3. Speech Studio seems to be able to train the service using this data without problems but we would like to know if doing so, with the higher sample rate, will only be of use in recognising streamed data also at the higher rate or does that not matter?

Upvotes: 0

Views: 252

Answers (1)

GiftA-MSFT
GiftA-MSFT

Reputation: 479

Having a higher sample rate like the one you described is desirable in terms of quality of the audio, but it generally won't influence speech recognition. As long as you meet the audio format minimum requirements, speech recognition should work just fine.

Upvotes: 0

Related Questions