How to enable speaker diarization in Google Cloud Speech library for Node JS?

Question

I'm currently trying to create a web app that uses google cloud speech-to-text, and the speaker diarization feature in particular. My server is written in node js and i'm sending in the audio file as a google storage URI. My speech config looks like this

config: {
          encoding: 'LINEAR16',
          languageCode: 'en-GB',
          sampleRateHertz: 8000,
          enableSpeakerDiarization: true,
          diarizationSpeakerCount: true,
        }

and the transcripts i'm getting back have an empty 'words' array, which the google cloud speech documentation tells me should contain the speaker tags:

{ words: [],
transcript: 'and the rabbit sails at dusk',
confidence: 0.8659023642539978 }

it might be worth noting that if i add

enableWordTimeOffsets: true,

to my config then i get a 'words' array like this:

[ { startTime: { seconds: '0', nanos: 0 },
endTime: { seconds: '0', nanos: 600000000 },
word: 'Hello' } etc..

Update

I wasn't importing the nodejs google cloud speech library correctly, i did this:

const speech = require('@google-cloud/speech');

where in order to use beta features i needed to use this:

const speech = require('@google-cloud/speech').v1p1beta1;

after i made this change the issue was resolved.

How to enable speaker diarization in Google Cloud Speech library for Node JS?

Update

Answers (1)

Related Questions