Mandeep Singh
Mandeep Singh

Reputation: 8234

How to use google speech API for audio having 2 channels

We have audio recordings with 2 people speaking on different channels. I am trying the official documentation for node.js here. First of all, I got an error that the payload size was exceeding the maximum limit.

ubuntu@ip-xxxx:~/nodejs-docs-samples/speech$ node recognize.js async /home/ubuntu/output.wav
(node:18306) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): Error: Request payload size exceeds the limit: 10485760 bytes.

The documentation however, has just mentioned the limits in terms of recording length and not in terms of the file size. Here is the link

Is there any workaround for this ?

Also, I tried with a smaller file size and got the configuration error:

ubuntu@ip-xxx:~/nodejs-docs-samples/speech$ node recognize.js async /home/ubuntu/output2.wav
(node:18291) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): Error: Invalid Configuration, Does not match Wav File Header.
Wav Header Contents:
Encoding: LINEAR16
Channels: 2
Sample Rate: 16000.
Request Contents:
Encoding: linear16
Channels: 1
Sample Rate: 16000.

I am not sure if the API allows usage of 2 channel audio input since I could not find any such config in the documentation. However, I found this link where it is suggested to split the audio to individual channels and use them separately. What is the recommended way of doing this programmatically ?

Upvotes: 1

Views: 2136

Answers (1)

Mandeep Singh
Mandeep Singh

Reputation: 8234

I took this approach eventually

  • Split the files to channels using sox
  • upload both the channel audios to google cloud storage (for local files, speech API will not process if the recording length is over 1 minute. So if the files are big, they must be uploaded to google cloud storage)
  • Pass each of the files through the speech recognition API
  • Keep the transcripts as separate. There is no way we can merge the two since google speech API does not provide the timestamp for the words

Here is a helper function to split the files to channels

function splitFileToChannels (fileName) {
  let output = {
    channel1: `${fileName}_channel1.wav`,
    channel2: `${fileName}_channel2.wav`
  };
  let channel1Command = `sox ${fileName} ${fileName}_channel1.wav remix 1`;
  let channel2Command = `sox ${fileName} ${fileName}_channel2.wav remix 2`;
  return Promise.all([
    childProcess.execAsync(channel1Command),
    childProcess.execAsync(channel2Command)
  ])
  .then(() => {
    return output;
  });
}

Also, I had to convert the mp3 file to wav format first before splitting to channels.

Upvotes: 3

Related Questions