Batch create transcription always results in: The recordings URI contains invalid data

Question

I would like to use Azure Speech Services Batch Transcription APIs to create a transcription of my audio file. I've already had success using the Speech Service SDK (for Node.js), but was interested in trying out one of the newer features available in v3.1 preview version of the api (displayFormWordLevelTimestampsEnabled), so I figured I had to do use the REST API service to do that.

Overall my problem is that for whatever input I've feed the Create Transcript API for contentUrls, I always end up getting the same error:

"error": {
   "code": "InvalidData",
   "message": "The recordings URI contains invalid data."
}

After a little digging, I found some tips through the Azure portal to use sox to handle transcoding the audio file in the specific format requested.

The specific format they mention in the portal documentation shows: If you are using REST API, make sure that it uses one of the formats in this table:

Format	Codec	Bit rate	Sample Rate
WAV	PCM	256 kbps	16 kHz, mono
OGG	OPUS	256 kpbs	16 kHz, mono

With the sox specific commands being:

Activity	SoX command
Check the audio file format.	sox --i
Convert the audio file to single channel, 16-bit, 16 KHz.	sox -b 16 -e signed-integer -c 1 -r 16k -t wav .wav

I ran my mp3 through the second command and verified the file with the first, and the contents of the file looks like:

Input File     : 'out5.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:30.09 = 481488 samples ~ 2256.97 CDDA sectors
File Size      : 963k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Finally, I uploaded the file to a public S3 bucket, to use as my content url for my request:

POST https://westus.api.cognitive.microsoft.com/speechtotext/v3.0/transcriptions

{
  "contentUrls": [
        "https://s3.us-west-1.amazonaws.com/xxxx/out5.wav"
  ],
  "locale": "en-US",
  "displayName": "Test"
}

Still it failed with the same error that I posted above. Any insights into what might be wrong? Thanks!

Update:

The answer below mentioned being able to reference a reports.json file on the Get Transcript/Create Transcript api call.

When I use the Create Transcript API my payload is:

{
    "self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/transcriptions/02815462-e9c0-4fdc-8bbe-7b0e78152f95",
    "model": {
        "self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/models/base/c3b008fa-eb47-4f6d-a5b9-71dd37870bb7"
    },
    "links": {
        "files": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/transcriptions/02815462-e9c0-4fdc-8bbe-7b0e78152f95/files"
    },
    "properties": {
        "diarizationEnabled": false,
        "wordLevelTimestampsEnabled": false,
        "displayFormWordLevelTimestampsEnabled": false,
        "channels": [
            0,
            1
        ],
        "punctuationMode": "DictatedAndAutomatic",
        "profanityFilterMode": "Masked"
    },
    "lastActionDateTime": "2022-09-13T23:37:09Z",
    "status": "NotStarted",
    "createdDateTime": "2022-09-13T23:37:09Z",
    "locale": "en-US",
    "displayName": "Test"
}

Calling the Get Transcript I see:

{
    "self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/transcriptions/02815462-e9c0-4fdc-8bbe-7b0e78152f95",
    "model": {
        "self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/models/base/c3b008fa-eb47-4f6d-a5b9-71dd37870bb7"
    },
    "links": {
        "files": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/transcriptions/02815462-e9c0-4fdc-8bbe-7b0e78152f95/files"
    },
    "properties": {
        "diarizationEnabled": false,
        "wordLevelTimestampsEnabled": false,
        "displayFormWordLevelTimestampsEnabled": false,
        "channels": [
            0,
            1
        ],
        "punctuationMode": "DictatedAndAutomatic",
        "profanityFilterMode": "Masked",
        "error": {
            "code": "InvalidData",
            "message": "The recordings URI contains invalid data."
        }
    },
    "lastActionDateTime": "2022-09-13T23:37:22Z",
    "status": "Failed",
    "createdDateTime": "2022-09-13T23:37:09Z",
    "locale": "en-US",
    "displayName": "Test"
}

And finally looking at the transcript files I'm getting an empty list:

{
    "values": []
}

I see no reference to a reports.json, or any data populated here at all.

Batch create transcription always results in: The recordings URI contains invalid data

Answers (1)

Related Questions