Hadi Nasser
Hadi Nasser

Reputation: 61

DialogFlow not recognizing audio inputs

I am using DialogFlow chatbot to detect text and audio inputs ,text is working fine, but audio doesn't, I am reading audio files I generate (.mp3 and .wav) and read it in nodejs and sending it but there is no response, I get an empty intent, its not even logged in DialogFlow History, but when I provide it with a sample audio from DialogFlow it works fine, Here is my code I, am following the documentation provided by DialogFlow:

const sessionId = uuid.v4();

  const sessionClient = new dialogflow.SessionsClient({
    projectId,
    keyFilename,
  });

  const readFile = util.promisify(fs.readFile);
  const inputAudio = await readFile('myfilepath.mp3');
  const sessionPath = sessionClient.projectAgentSessionPath(projectId, sessionId);
  const request = {
    session: sessionPath,
    queryInput: {
      audioConfig: {
        audioEncoding: 'AUDIO_ENCODING_LINEAR_16',
        sampleRateHertz: 16000,
        languageCode: 'en-US',
      },
    },
    inputAudio,
  };
  const [response] = await sessionClient.detectIntent(request);

  console.log('Detected intent:');
  console.log(response);

  const result = response.queryResult;

  console.log(`  Query: ${result.queryText}`);
  console.log(`  Response: ${result.fulfillmentText}`);

The response is always

{
  responseId: '',
  queryResult: {
    fulfillmentMessages: [],
    outputContexts: [],
    queryText: '',
    speechRecognitionConfidence: 0,
    action: '',
    parameters: null,
    allRequiredParamsPresent: false,
    fulfillmentText: '',
    webhookSource: '',
    webhookPayload: null,
    intent: null,
    intentDetectionConfidence: 0,
    diagnosticInfo: null,
    languageCode: 'en-US',
    sentimentAnalysisResult: null
  },
  webhookStatus: null,
  outputAudio: <Buffer >,
  outputAudioConfig: null
}

Is there a specific way to generate audio file I have to follow or what?

Thank you.

Upvotes: 1

Views: 951

Answers (1)

S. Tyr
S. Tyr

Reputation: 679

I think your issue is due to the encoding and the sample rate from your audio files.

I was able to replicate the scenario and get your response output using the samples from nodejs-dialogflow, in particular the detect.js one, with an MP3 file running the sample like this:

node detect audio resources/book_a_room.mp3 -r 16000

When looking at the supported encodings on the sample running:

node detect audio -help

We can see that the options available are the following:

  -e, --encoding    The encoding of the input audio.
              [choices: "AUDIO_ENCODING_LINEAR_16", "AUDIO_ENCODING_FLAC", "AUDIO_ENCODING_MULAW", "AUDIO_ENCODING_AMR",
                  "AUDIO_ENCODING_AMR_WB", "AUDIO_ENCODING_OGG_OPUS", "AUDIO_ENCODING_SPEEX_WITH_HEADER_BYTE"] [default:
                                                                                             "AUDIO_ENCODING_LINEAR_16"]

These options can also be seen on the Dialogflow API reference for AudioEncoding. From there we can conclude that MP3 is not a supported encoding, and that's why you get that response output.

Also when looking at the Dialogflow API reference you can see the following:

... Refer to the Cloud Speech API documentation for more details.

Looking into that documentation and going to the encoding section, you can see that:

Note: Speech-to-Text supports WAV files with LINEAR16 or MULAW encoded audio.

So going back to the encoding options from the Dialogflow API we can see that for WAV files, the ones you could use are:

  • AUDIO_ENCODING_LINEAR_16
  • AUDIO_ENCODING_MULAW

It's important to notice that AUDIO_ENCODING_LINEAR_16 option is set by default.

Thus, you could use WAV files with 1 channel with the proper sample rate according to the one that your WAV file has (e.g. 44100), and then you'll get the desired response. For example:

node detect audio resources/book_a_room_1ch_16Khz.wav -r 16000

Or

node detect audio resources/book-a-room_1ch_44.1Khz.wav -r 44100

Otherwise, you'll get error messages like the following:

{ Error: 3 INVALID_ARGUMENT: Must use single channel (mono) audio, but WAV header indicates 2 channels.
    at Object.callErrorFromStatus ...

And

{ Error: 3 INVALID_ARGUMENT: sample_rate_hertz (16000) in RecognitionConfig must either be omitted or match the value in the WAV header ( 44100).
    at Object.callErrorFromStatus ...

Upvotes: 1

Related Questions