user3218338
user3218338

Reputation: 682

Using class tokens in Google Speech v2

How can I use class tokens when performing speech transcription in Google Speech v2? According to this documentation it is available for use in the version 2 https://cloud.google.com/speech-to-text/v2/docs/class-tokens

However, I am trying hard to find any example or documentation showing how to use these tokens.

Here's an example in Speech v1 that uses the class token $TIME; it can successfully transcribe my file.wav into value 0720 (07:20).

from google.cloud import speech
import os

os.environ['GOOGLE_APPLICATION_CREDENTIALS']= 'auth.json'

with open("file.wav", "rb") as f:
    audio_content = f.read()
    audio = speech.RecognitionAudio(content=audio_content)

# create client instance 
client = speech.SpeechClient()

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    language_code="sv-SE",
    speech_contexts=[  # Add class tokens here
        speech.SpeechContext(
            phrases=["$TIME"]  # These are your class tokens
        )
    ]
)

response = client.recognize(request={"config": config, "audio": audio})
for result in response.results:
    print(result.alternatives[0].transcript)

Here's another example using Speech v2 and my attempt on using class tokens. This one fails to recognise my file as 0720, being outperformed by the v1 which makes me think the class token is not being respected.

import os
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

PROJECT_ID = "test"
os.environ['GOOGLE_APPLICATION_CREDENTIALS']= 'auth.json'

with open("file.wav", "rb") as f:
    audio_content = f.read()

client = SpeechClient()
config = cloud_speech.RecognitionConfig(
    auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
    language_codes=["sv-SE"],
    model="short",
    adaptation=cloud_speech.SpeechAdaptation(
      phrase_sets=[
            cloud_speech.SpeechAdaptation.AdaptationPhraseSet(
                inline_phrase_set=cloud_speech.PhraseSet(phrases=[
                {
                    "value": "$TIME",
                    "boost": 20
                }
            ])
        )
      ]
    )
)

request = cloud_speech.RecognizeRequest(
    recognizer=f"projects/{PROJECT_ID}/locations/global/recognizers/_",
    config=config,
    content=audio_content,
)

# Transcribes the audio into text
response = client.recognize(request=request)
for result in response.results:
    print(result.alternatives[0].transcript)

Anyone can help me and show how to apply class tokens in the Speech v2 API? Or give me clues why v2 is being beaten by v1 in this test.

Upvotes: 0

Views: 30

Answers (0)

Related Questions