Reputation: 469
I'm currently trying to create a web app that uses google cloud speech-to-text, and the speaker diarization feature in particular. My server is written in node js and i'm sending in the audio file as a google storage URI. My speech config looks like this
config: {
encoding: 'LINEAR16',
languageCode: 'en-GB',
sampleRateHertz: 8000,
enableSpeakerDiarization: true,
diarizationSpeakerCount: true,
}
and the transcripts i'm getting back have an empty 'words' array, which the google cloud speech documentation tells me should contain the speaker tags:
{ words: [],
transcript: 'and the rabbit sails at dusk',
confidence: 0.8659023642539978 }
it might be worth noting that if i add
enableWordTimeOffsets: true,
to my config then i get a 'words' array like this:
[ { startTime: { seconds: '0', nanos: 0 },
endTime: { seconds: '0', nanos: 600000000 },
word: 'Hello' } etc..
I wasn't importing the nodejs google cloud speech library correctly, i did this:
const speech = require('@google-cloud/speech');
where in order to use beta features i needed to use this:
const speech = require('@google-cloud/speech').v1p1beta1;
after i made this change the issue was resolved.
Upvotes: 4
Views: 1230
Reputation: 11
Config should be something like this
const config = {
encoding: 'LINEAR16',
sampleRateHertz: 8000,
languageCode: 'en-GB'
enableAutomaticPunctuation: true,
useEnhanced: true,
model: 'video',
diarizationConfig : {
enableSpeakerDiarization: true,
minSpeakerCount: 2,
maxSpeakerCount: 3,
}
}
For more information about RecognitionConfig visit
https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig
Upvotes: 0