What is the sampling frequency of the audio generated by IBM Watson Text to Speech service?

I am using the Watson Text to Speech service to generate audio files in MP3 & WAV format. What is the default sampling frequency of these audios? Is there any way to specify the sampling rate while hitting the API(for MP3 & WAV)? The Watson Speech to Text recommends using audios of 16 kHz for Broadband models.

Upvotes: 0

Answers (2)

Radek Kazbunda

Reputation: 11

These information are easy to find in documentation.

TextToSpeech voices are created as 22050 Hz, you can force different output sampling rate but the service will only down/upsample it before providing the result.

SpeechToText generally supports 16000 Hz for BroadBand, and 8000 Hz for narrow band. The best thing is to use audio in come container which has the sampling rate information in headers, flac, wav (not pcm). As for SpeechToText, it is important that the audio really has information in relevant spectrum, so you cannot upsample 8kHZ telephone communication to 16kHZ and send it to broadband model.

Upvotes: 0

Varun

Reputation: 76

The default sampling rate is 22,050 Hz and it is specified using rate parameter. From the documentation i can see it is optional paramter. FYR - https://console.bluemix.net/docs/services/text-to-speech/http.html#format

Upvotes: 0

What is the sampling frequency of the audio generated by IBM Watson Text to Speech service?

Answers (2)

Related Questions