Ozgur G
Ozgur G

Reputation: 33

Creating a speech service from Azure Speech to Text Rest API

I can see there are two versions of REST API endpoints for Speech to Text in the Microsoft documentation links.

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription and https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-speech-to-text

One endpoint is [https://.api.cognitive.microsoft.com/sts/v1.0/issueToken] referring to version 1.0 and another one is [api/speechtotext/v2.0/transcriptions] referring to version 2.0. How can I create a speech-to-text service in Azure Portal for the latter one?

Whenever I create a service in different regions, it always creates for speech to text v1.0.

Any tips?

PS: I've Visual Studio Enterprise account with monthly allowance and I am creating a subscription (s0) (paid) service rather than free (trial) (f0) service.

Thanks, Ozgur

Upvotes: 1

Views: 2610

Answers (2)

Nicolas R
Nicolas R

Reputation: 14619

All official Microsoft Speech resource created in Azure Portal is valid for Microsoft Speech 2.0

I understand that this v1.0 in the token url is surprising, but this token API is not part of Speech API.

So go to Azure Portal, create a Speech resource, and you're done.

enter image description here

If you want to be sure, go to your created resource, copy your key. That's what you will use for Authorization, in a header called Ocp-Apim-Subscription-Key header, as explained here

Demo:

  • Get your key on your created resource
  • Go to https://[REGION].cris.ai/swagger/ui/index (REGION being the region where you created your speech resource)
  • Click on Authorize: you will see both forms of Authorization

authorization button

authorization process

  • Paste your key in the 1st one (subscription_Key), validate
  • Close this window
  • Test one of the endpoints, for example the one listing the speech endpoints, by going to the GET operation on /api/speechtotext/v2.0/endpoints
  • Click 'Try it out' and you will get a 200 OK reply!

enter image description here

Upvotes: 0

Jay Gong
Jay Gong

Reputation: 23792

Understand your confusion because MS document for this is ambiguous. Per my research,let me clarify it as below: Two type services for Speech-To-Text exist, v1 and v2.

v1 could be found under Cognitive Service structure when you create it:

enter image description here

Based on statements in the Speech-to-text REST API document:

Before using the speech-to-text REST API, understand:

  • Requests that use the REST API and transmit audio directly can only contain up to 60 seconds of audio.
  • The speech-to-text REST API only returns final results. Partial results are not provided.

If sending longer audio is a requirement for your application, consider using the Speech SDK or a file-based REST API, like batch transcription.

So v1 has some limitation for file formats or audio size. If you have further more requirement,please navigate to v2 api- Batch Transcription hosted by Zoom Media.You could figure it out if you read this document from ZM. You could create that Speech Api in Azure Marketplace:

enter image description here

That's the creation page for it :

enter image description here

Also,you could view the API document at the foot of above page, it's V2 API document.

Final tip:

v1's endpoint like: https://eastus.api.cognitive.microsoft.com/sts/v1.0/issuetoken

v2's endpoint like:

enter image description here

Upvotes: -1

Related Questions