Owen Pearson
Owen Pearson

Reputation: 469

How to enable speaker diarization in Google Cloud Speech library for Node JS?

I'm currently trying to create a web app that uses google cloud speech-to-text, and the speaker diarization feature in particular. My server is written in node js and i'm sending in the audio file as a google storage URI. My speech config looks like this

config: {
          encoding: 'LINEAR16',
          languageCode: 'en-GB',
          sampleRateHertz: 8000,
          enableSpeakerDiarization: true,
          diarizationSpeakerCount: true,
        }

and the transcripts i'm getting back have an empty 'words' array, which the google cloud speech documentation tells me should contain the speaker tags:

{ words: [],
transcript: 'and the rabbit sails at dusk',
confidence: 0.8659023642539978 }

it might be worth noting that if i add

enableWordTimeOffsets: true,

to my config then i get a 'words' array like this:

[ { startTime: { seconds: '0', nanos: 0 },
endTime: { seconds: '0', nanos: 600000000 },
word: 'Hello' } etc..

Update

I wasn't importing the nodejs google cloud speech library correctly, i did this:

const speech = require('@google-cloud/speech');

where in order to use beta features i needed to use this:

const speech = require('@google-cloud/speech').v1p1beta1;

after i made this change the issue was resolved.

Upvotes: 4

Views: 1230

Answers (1)

jeeson
jeeson

Reputation: 11

Config should be something like this

const config = {
        encoding: 'LINEAR16',
        sampleRateHertz: 8000,
        languageCode: 'en-GB'
        enableAutomaticPunctuation: true,
        useEnhanced: true,
        model: 'video',
        diarizationConfig : {
          enableSpeakerDiarization: true,
          minSpeakerCount: 2,
          maxSpeakerCount: 3,
      }
    }

For more information about RecognitionConfig visit

https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig

Upvotes: 0

Related Questions