shanewwarren
shanewwarren

Reputation: 2294

Batch create transcription always results in: The recordings URI contains invalid data

I would like to use Azure Speech Services Batch Transcription APIs to create a transcription of my audio file. I've already had success using the Speech Service SDK (for Node.js), but was interested in trying out one of the newer features available in v3.1 preview version of the api (displayFormWordLevelTimestampsEnabled), so I figured I had to do use the REST API service to do that.

Overall my problem is that for whatever input I've feed the Create Transcript API for contentUrls, I always end up getting the same error:

"error": {
   "code": "InvalidData",
   "message": "The recordings URI contains invalid data."
}

After a little digging, I found some tips through the Azure portal to use sox to handle transcoding the audio file in the specific format requested.

The specific format they mention in the portal documentation shows: If you are using REST API, make sure that it uses one of the formats in this table:

Format Codec Bit rate Sample Rate
WAV PCM 256 kbps 16 kHz, mono
OGG OPUS 256 kpbs 16 kHz, mono

With the sox specific commands being:

Activity SoX command
Check the audio file format. sox --i
Convert the audio file to single channel, 16-bit, 16 KHz. sox -b 16 -e signed-integer -c 1 -r 16k -t wav .wav

I ran my mp3 through the second command and verified the file with the first, and the contents of the file looks like:

Input File     : 'out5.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 00:00:30.09 = 481488 samples ~ 2256.97 CDDA sectors
File Size      : 963k
Bit Rate       : 256k
Sample Encoding: 16-bit Signed Integer PCM

Finally, I uploaded the file to a public S3 bucket, to use as my content url for my request:

POST https://westus.api.cognitive.microsoft.com/speechtotext/v3.0/transcriptions

{
  "contentUrls": [
        "https://s3.us-west-1.amazonaws.com/xxxx/out5.wav"
  ],
  "locale": "en-US",
  "displayName": "Test"
}

Still it failed with the same error that I posted above. Any insights into what might be wrong? Thanks!

Update:

The answer below mentioned being able to reference a reports.json file on the Get Transcript/Create Transcript api call.

When I use the Create Transcript API my payload is:

{
    "self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/transcriptions/02815462-e9c0-4fdc-8bbe-7b0e78152f95",
    "model": {
        "self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/models/base/c3b008fa-eb47-4f6d-a5b9-71dd37870bb7"
    },
    "links": {
        "files": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/transcriptions/02815462-e9c0-4fdc-8bbe-7b0e78152f95/files"
    },
    "properties": {
        "diarizationEnabled": false,
        "wordLevelTimestampsEnabled": false,
        "displayFormWordLevelTimestampsEnabled": false,
        "channels": [
            0,
            1
        ],
        "punctuationMode": "DictatedAndAutomatic",
        "profanityFilterMode": "Masked"
    },
    "lastActionDateTime": "2022-09-13T23:37:09Z",
    "status": "NotStarted",
    "createdDateTime": "2022-09-13T23:37:09Z",
    "locale": "en-US",
    "displayName": "Test"
}

Calling the Get Transcript I see:

{
    "self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/transcriptions/02815462-e9c0-4fdc-8bbe-7b0e78152f95",
    "model": {
        "self": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/models/base/c3b008fa-eb47-4f6d-a5b9-71dd37870bb7"
    },
    "links": {
        "files": "https://westus.api.cognitive.microsoft.com/speechtotext/v3.1-preview.1/transcriptions/02815462-e9c0-4fdc-8bbe-7b0e78152f95/files"
    },
    "properties": {
        "diarizationEnabled": false,
        "wordLevelTimestampsEnabled": false,
        "displayFormWordLevelTimestampsEnabled": false,
        "channels": [
            0,
            1
        ],
        "punctuationMode": "DictatedAndAutomatic",
        "profanityFilterMode": "Masked",
        "error": {
            "code": "InvalidData",
            "message": "The recordings URI contains invalid data."
        }
    },
    "lastActionDateTime": "2022-09-13T23:37:22Z",
    "status": "Failed",
    "createdDateTime": "2022-09-13T23:37:09Z",
    "locale": "en-US",
    "displayName": "Test"
}

And finally looking at the transcript files I'm getting an empty list:

{
    "values": []
}

I see no reference to a reports.json, or any data populated here at all.

Upvotes: 1

Views: 1668

Answers (1)

chlandsi
chlandsi

Reputation: 61

In many cases you can get a detailed error information by doing a GET on https://westus.api.cognitive.microsoft.com/speechtotext/v3.0/transcriptions/<transcription_id>/files and looking at the report.json that is referenced there.

If that doesn't help, you could post transcription id(s) of failed transcription so someone from the team (I am one of them) can look at the service logs.

Upvotes: 6

Related Questions