Reputation: 45
Using Google-Speech-to-Text, I am able to transcribe an audio clip with the default parameters. However, I get an error message while using the enable_speaker_diarization tag to profile individual speakers in the audio clip. Google documents it here This is a long recognize audio clip hence I am using async request which Google recommends here
My code -
def transcribe_gcs(gcs_uri):
from google.cloud import speech
from google.cloud import speech_v1 as speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri = gcs_uri)
config = speech.types.RecognitionConfig(encoding=speech.enums.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz= 16000,
language_code = 'en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result(timeout=3000)
result = response.results[-1]
words_info = result.alternatives[0].words
for word_info in words_info:
print("word: '{}', speaker_tag: {}".format(word_info.word, word_info.speaker_tag))
After using -
transcribe_gcs('gs://bucket_name/filename.flac')
I get the error
ValueError: Protocol message RecognitionConfig has no "enable_speaker_diarization" field.
I am sure this is something to do with libraries, I have used all variants I could find like
from google.cloud import speech_v1p1beta1 as speech
from google.cloud import speech
But I keep getting the same error. Note - I have already authenticated using the JSON file prior to running this code.
Upvotes: 4
Views: 3922
Reputation: 1554
The enable_speaker_diarization=True
parameter in speech.types.RecognitionConfig
is available only in the library speech_v1p1beta1
at the moment, so, you need to import that library in order to use that parameter, not the default speech one. I did some modifications to your code and works fine for me. Take into account that you need to use a service account to run this code.
def transcribe_gcs(gcs_uri):
from google.cloud import speech_v1p1beta1 as speech
from google.cloud.speech_v1p1beta1 import enums
from google.cloud.speech_v1p1beta1 import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri = gcs_uri)
config = speech.types.RecognitionConfig( language_code = 'en-US',enable_speaker_diarization=True, diarization_speaker_count=2)
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result(timeout=3000)
result = response.results[-1]
words_info = result.alternatives[0].words
tag=1
speaker=""
for word_info in words_info:
if word_info.speaker_tag==tag:
speaker=speaker+" "+word_info.word
else:
print("speaker {}: {}".format(tag,speaker))
tag=word_info.speaker_tag
speaker=""+word_info.word
print("speaker {}: {}".format(tag,speaker))
And the result should be like:
Upvotes: 10
Reputation: 489
The error is because you have not imported some files. To do so, import the following files.
from google.cloud import speech_v1p1beta1 as speech
from google.cloud.speech_v1p1beta1 import enums
from google.cloud.speech_v1p1beta1 import types
Upvotes: 0
Reputation: 11164
The error cause is similar to Node JS users as well. Import the beta feature via this call, and then use the speaker identification features.
const speech = require('@google-cloud/speech').v1p1beta1;
Upvotes: 0