Reputation: 2294
I would like to use Azure Speech Services Batch Transcription APIs to create a transcription of my audio file. I've already had success using the Speech Service SDK (for Node.js), but was interested in trying out one of the newer features available in v3.1 preview version of the api (displayFormWordLevelTimestampsEnabled
), so I figured I had to do use the REST API service to do that.
Overall my problem is that for whatever input I've feed the Create Transcript
API for contentUrls
, I always end up getting the same error:
"error": {
"code": "InvalidData",
"message": "The recordings URI contains invalid data."
}
After a little digging, I found some tips through the Azure portal to use sox
to handle transcoding the audio file in the specific format requested.
The specific format they mention in the portal documentation shows: If you are using REST API, make sure that it uses one of the formats in this table:
Format | Codec | Bit rate | Sample Rate |
---|---|---|---|
WAV | PCM | 256 kbps | 16 kHz, mono |
OGG | OPUS | 256 kpbs | 16 kHz, mono |
With the sox specific commands being:
Activity | SoX command |
---|---|
Check the audio file format. | sox --i |
Convert the audio file to single channel, 16-bit, 16 KHz. | sox -b 16 -e signed-integer -c 1 -r 16k -t wav .wav |
I ran my mp3 through the second command and verified the file with the first, and the contents of the file looks like:
Input File : 'out5.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:30.09 = 481488 samples ~ 2256.97 CDDA sectors
File Size : 963k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM
Finally, I uploaded the file to a public S3 bucket, to use as my content url for my request:
POST https://westus.api.cognitive.microsoft.com/speechtotext/v3.0/transcriptions
{
"contentUrls": [
"https://s3.us-west-1.amazonaws.com/xxxx/out5.wav"
],
"locale": "en-US",
"displayName": "Test"
}
Still it failed with the same error that I posted above. Any insights into what might be wrong? Thanks!
Update:
The answer below mentioned being able to reference a reports.json file on the Get Transcript
/Create Transcript
api call.
When I use the Create Transcript
API my payload is:
{
"self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/transcriptions/02815462-e9c0-4fdc-8bbe-7b0e78152f95",
"model": {
"self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/models/base/c3b008fa-eb47-4f6d-a5b9-71dd37870bb7"
},
"links": {
"files": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/transcriptions/02815462-e9c0-4fdc-8bbe-7b0e78152f95/files"
},
"properties": {
"diarizationEnabled": false,
"wordLevelTimestampsEnabled": false,
"displayFormWordLevelTimestampsEnabled": false,
"channels": [
0,
1
],
"punctuationMode": "DictatedAndAutomatic",
"profanityFilterMode": "Masked"
},
"lastActionDateTime": "2022-09-13T23:37:09Z",
"status": "NotStarted",
"createdDateTime": "2022-09-13T23:37:09Z",
"locale": "en-US",
"displayName": "Test"
}
Calling the Get Transcript
I see:
{
"self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/transcriptions/02815462-e9c0-4fdc-8bbe-7b0e78152f95",
"model": {
"self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/models/base/c3b008fa-eb47-4f6d-a5b9-71dd37870bb7"
},
"links": {
"files": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/transcriptions/02815462-e9c0-4fdc-8bbe-7b0e78152f95/files"
},
"properties": {
"diarizationEnabled": false,
"wordLevelTimestampsEnabled": false,
"displayFormWordLevelTimestampsEnabled": false,
"channels": [
0,
1
],
"punctuationMode": "DictatedAndAutomatic",
"profanityFilterMode": "Masked",
"error": {
"code": "InvalidData",
"message": "The recordings URI contains invalid data."
}
},
"lastActionDateTime": "2022-09-13T23:37:22Z",
"status": "Failed",
"createdDateTime": "2022-09-13T23:37:09Z",
"locale": "en-US",
"displayName": "Test"
}
And finally looking at the transcript files I'm getting an empty list:
{
"values": []
}
I see no reference to a reports.json, or any data populated here at all.
Upvotes: 1
Views: 1668
Reputation: 61
In many cases you can get a detailed error information by doing a GET on https://westus.api.cognitive.microsoft.com/speechtotext/v3.0/transcriptions/<transcription_id>/files
and looking at the report.json that is referenced there.
If that doesn't help, you could post transcription id(s) of failed transcription so someone from the team (I am one of them) can look at the service logs.
Upvotes: 6