node.jsstreaminggoogle-speech-apigoogle-cloud-speechgoogle-speech-to-text-api

Reputation: 61

How to setup streamingRecognize Google Cloud Speech To Text V2 in Node.js?

I am trying to setup the streamingRecognize() Google Cloud Speech to Text V2 in Node.js for streaming audio data and it always throws me the same Error upon the initial recognizer request to setup the stream:

Error: 3 INVALID_ARGUMENT: Invalid resource field value in the request.
    at callErrorFromStatus (/Users/<filtered>/backend/node_modules/@grpc/grpc-js/src/call.ts:81:17)
    at Object.onReceiveStatus (/Users/<filtered>/backend/node_modules/@grpc/grpc-js/src/client.ts:701:51)
    at Object.onReceiveStatus (/Users/<filtered>/backend/node_modules/@grpc/grpc-js/src/client-interceptors.ts:416:48)
    at /Users/<filtered>/backend/node_modules/@grpc/grpc-js/src/resolving-call.ts:111:24
    at processTicksAndRejections (node:internal/process/task_queues:77:11)
for call at
    at ServiceClientImpl.makeBidiStreamRequest (/Users/<filtered>/backend/node_modules/@grpc/grpc-js/src/client.ts:685:42)
    at ServiceClientImpl.<anonymous> (/Users/<filtered>/backend/node_modules/@grpc/grpc-js/src/make-client.ts:189:15)
    at /Users/<filtered>/backend/node_modules/@google-cloud/speech/build/src/v2/speech_client.js:318:29
    at /Users/<filtered>/backend/node_modules/google-gax/src/streamingCalls/streamingApiCaller.ts:71:19
    at /Users/<filtered>/backend/node_modules/google-gax/src/normalCalls/timeout.ts:54:13
    at StreamProxy.setStream (/Users/<filtered>/backend/node_modules/google-gax/src/streamingCalls/streaming.ts:204:20)
    at StreamingApiCaller.call (/Users/<filtered>/backend/node_modules/google-gax/src/streamingCalls/streamingApiCaller.ts:88:12)
    at /Users/<filtered>/backend/node_modules/google-gax/src/createApiCall.ts:118:26
    at processTicksAndRejections (node:internal/process/task_queues:95:5)

{
  code: 3,
  details: 'Invalid resource field value in the request.',
  metadata: Metadata {
    internalRepr: Map(2) {
      'google.rpc.errorinfo-bin' => [Array],
      'grpc-status-details-bin' => [Array]
    },
    options: {}
  },
  statusDetails: [
    ErrorInfo {
      metadata: [Object],
      reason: 'RESOURCE_PROJECT_INVALID',
      domain: 'googleapis.com'
    }
  ],
  reason: 'RESOURCE_PROJECT_INVALID',
  domain: 'googleapis.com',
  errorInfoMetadata: {
    service: 'speech.googleapis.com',
    method: 'google.cloud.speech.v2.Speech.StreamingRecognize'
  }
}

The stream setup process has two steps 1. sending the recognizer request object to tell google what recognizer to use (consisting of the path to the recognizer object as string and an optional config object to overwrite certain options of the recognizer) for the following audio data in bytes and 2. The same request with no config but an audio Buffer for the audio to be transcribed.

I did not get to sending the audio data since the initial recognizer request always failed.

Would be great if someone could help me with this issue since it seems to be rather simple one which might be super obvious if you know where the issue originates from.

My guesses where I made a mistake:

I misconfigured something in Google Cloud, but this does not seem too plausible since everything else worked except the streaming requests.
I build the request object wrong. If this is the case, please also provide the request object for sending the audio buffer.

I have read through the Google Cloud Speech to Text V2 docs and tried to implement everything as described. In the end it should return transcribed audio.

Setup a recognizer in the Google Cloud console.
Checked if all necessary APIs where enabled.
Checked if the service account etc. has the correct permissions for authentication etc.
Checked if authentication works correctly.

I also tried several times to implement streamingRecognize() as follows and with some slight variations:

public async initialize() {
    
    const recognizerName = `projects/${this.projectId}/locations/global/recognizers/_`;
    const transcriptionRequest = {
      recognizer: recognizerName,
      streaming_config: streamingConfig,
    };

    const stream = this.client
      .streamingRecognize()
      .on("data", function (response) {
        console.log(response);
      })
      .on("error", function (response) {
        console.log(response);
      });

    // Write request objects.
    stream.write(transcriptionRequest);
  }

I have also tried to use several recognizer_ids instead of "_" in recognizerName. I have tried several different types of transcriptionRequests where I omitted the streaming_config or renamed it to just "config". I have triple checked my projectId which I have also exchanged for the project number instead of the project-id (found on the main page of the google cloud console). Nothing worked and I always receive the same Error.

Besides that I have also tried to make a normale createRecognizer and recognize request using v2 like this which worked fine:

 // Creates a Recognizer: WORKS
  public async createRecognizer() {
    const recognizerRequest = {
      parent: `projects/${this.projectId}/locations/global`,
      recognizerId: "rclatest",
      recognizer: {
        languageCodes: ["en-US"],
        model: "telephony",
      },
    };

    const operation = await this.client.createRecognizer(recognizerRequest);
    const recognizer = operation[0].result;
    const recognizerName = recognizer; //.name;
    console.log(`Created new recognizer: ${recognizerName}`);
  }

  // Transcribes Audio: WORKS
  public async transcribeFile() {
    const recognizerName = `projects/${this.projectId}/locations/global/recognizers/${this.recognizerId}`;
    const content = fs.readFileSync(this.audioFilePath).toString("base64");
    const transcriptionRequest = {
      recognizer: recognizerName,
      config: {
        // Automatically detects audio encoding
        autoDecodingConfig: {},
      },
      content: content,
    };

    const response = await this.client.recognize(transcriptionRequest);
    for (const result of response[0].results) {
      console.log(`Transcript: ${result.alternatives[0].transcript}`);
    }
  }

Upvotes: 6

Answers (4)

Dominik Koller

Reputation: 21

I finally got this to work. I followed the Types provided by the TypeScript definition to understand the nested config structure that was necessary for this. If you use JS code, just leave out the types.

Three things to note at first:

In v2, streamingRecognize does not seem to work without using a 'recognizer', which is a pre-saved configuration. We can use a default recognizer though.
The standard streamingRecognize() does not work. It produces the error above. Instead, use _streamingRecognize(). I have no idea why.
You do not pass the config to _streamingRecognize(). Instead, you send it as the first write, after which you send audio, as an object { audio: data }

Here is my code that works:

// this is where I got the types from, which I used to figure out this nested config structure
import { google } from '@google-cloud/speech/build/protos/protos';

// Must have GOOGLE_APPLICATION_CREDENTIALS environment variable set
speechClient = new SpeechClient(); 

const recognitionConfig: google.cloud.speech.v2.IRecognitionConfig = {
        autoDecodingConfig: {},
        explicitDecodingConfig: {
            encoding: event.encoding,
            sampleRateHertz: event.sampleRateHertz,
            audioChannelCount: 1,
        },
        languageCodes: [event.languageCode],
        model: 'long' // video does not exist in v2
    }

    const streamingRecognitionConfig: google.cloud.speech.v2.IStreamingRecognitionConfig = {
        config: recognitionConfig,
        streamingFeatures: {
            interimResults: true,
        }
    }

    const streamingRecognizeRequest: google.cloud.speech.v2.IStreamingRecognizeRequest = {
        recognizer: `projects/${GOOGLE_PROJECT_ID}/locations/global/recognizers/_`,
        streamingConfig: streamingRecognitionConfig,
    };

recognizeStream = speechClient
    ._streamingRecognize()
    .on('error', (err) => {
        console.error(err);
    })
    .on('data', async (data) => { 
        // your code to react to answers from the API 
    });
recognizeStream.write(streamingRecognizeRequest); // Do this once and only once

When sending audio junks, you must send

recognizeStream.write({ audio: data }); // where data is your audio chunk

Note the GOOGLE_PROJECT_ID, where you put the ID of your project. You can find this in the Google Cloud Console.

Now about the recognizer URL - if I use another region, the call fails. I suspect you'd have to create a recognizer first to do this. I think you have to do this by code, as I found no way of creating one in the Google Cloud Console. See more in this issue as per Vladislav's anwer.

I am on version "@google-cloud/speech": "6.1.1",

Upvotes: 2

vladaman

Reputation: 3908

We have the following working solution for Dynamic batch speech recognition. Please note the importance of setting proper endpoint in the config and also for the model recognizer.

The Speech-to-Text V2 API has an option to use dynamic batch. Dynamic batch processes audio at a lower level of urgency. If you enable dynamic batch, you will be billed at a discounted rate.

const speech = require('@google-cloud/speech').v2;
const GOOGLE_PROJECT_ID = "your-project-id";

const gcsUri = "gs://speech-samples-00/commercial_mono.wav"; // must be in Google Storage

const configSpeachGoogle = {
    projectId: GOOGLE_PROJECT_ID,
    keyFilename: 'google-credentials.json',
    apiEndpoint: "europe-west4-speech.googleapis.com" // needed for chirp model
}
const speachClient = new speech.SpeechClient(configSpeachGoogle);

const recognizer = `projects/${GOOGLE_PROJECT_ID}/locations/europe-west4/recognizers/_`; // note the location in Europe which reflects your google config

Submit Job:

const batchConfig = {
  languageCodes: ["cs-CZ"],
  model: "chirp", // Available in europe-west4, us-central1, asia-southeast1
  autoDecodingConfig: {},
  explicitDecodingConfig: {
    encoding: "LINEAR16",
    sampleRateHertz: 8000,
    audioChannelCount: 1
   }
};
const configRequest = {
  recognizer: recognizer,
  config: batchConfig,
  files: [{
     uri: gcsUri
  }],
  recognitionOutputConfig: {
    gcsOutputConfig: {
       URI: "gs://my-results-bucket/outputs"
    }
  },
  processingStrategy: 'DYNAMIC_BATCHING'
};

Get Results:

let operation = await speachClient.batchRecognize(configRequest);
let data = await operation[0].promise();
console.log('Transcribe response', data[0].results);

Upvotes: 1

Vladislav Sorokin

Reputation: 415

I put my working code here: nodejs-docs-samples/issues/3578

v1 > v2 Migration considerations

to use v2 you need to create a recognizer, I did it with the client.createRecognizer function (the code is in the issue above)
The config object now should be sent as first data to the stream object, immediately before the audio, so if you did recognizingClient.write(audioData) before, now you should do (but only once!)recognizingClient.write(newConfigWithRecognizer) and then recognizingClient.write({audio: audioData})
The config object itself has been changed to:

public streamingConfig?: (google.cloud.speech.v2.IStreamingRecognitionConfig|null);

/** Properties of a StreamingRecognitionConfig. */
interface IStreamingRecognitionConfig {

** StreamingRecognitionConfig config */
config?: (google.cloud.speech.v2.IRecognitionConfig|null);

/** StreamingRecognitionConfig configMask */
configMask?: (google.protobuf.IFieldMask|null);

/** StreamingRecognitionConfig streamingFeatures */
streamingFeatures?: (google.cloud.speech.v2.IStreamingRecognitionFeatures|null);
}
`
When instantiating streamingClient use _streamingRecognize() (this probably is likely to be changed)

Upvotes: 0

Aleksei Krikunov

Reputation: 26

The following code works in my case:

const recognizer = `projects/${projectId}/locations/global/recognizers/_`
const google_model = "latest_long"
const streamingConfig = {
    config: {
        languageCodes: ["en-US"],
        model: google_model,
        autoDecodingConfig: {}
    },
};

const configRequest = {
    recognizer: recognizer,
    streamingConfig: streamingConfig,
};

const recognizeStream = client
    ._streamingRecognize()
    .on('error', (err) => {
        console.error(err);
    })
    .on('data', (data) => {
        console.log(data)
    }
    );

recognizeStream.write(configRequest);

UPD:

On the request of @Cybersupernova I add the screenshot with code and the run results. Screenshot

Upvotes: 0

How to setup streamingRecognize Google Cloud Speech To Text V2 in Node.js?

Answers (4)

v1 > v2 Migration considerations

Related Questions