Reputation: 61
I am using DialogFlow chatbot to detect text and audio inputs ,text is working fine, but audio doesn't, I am reading audio files I generate (.mp3 and .wav) and read it in nodejs and sending it but there is no response, I get an empty intent, its not even logged in DialogFlow History, but when I provide it with a sample audio from DialogFlow it works fine, Here is my code I, am following the documentation provided by DialogFlow:
const sessionId = uuid.v4();
const sessionClient = new dialogflow.SessionsClient({
projectId,
keyFilename,
});
const readFile = util.promisify(fs.readFile);
const inputAudio = await readFile('myfilepath.mp3');
const sessionPath = sessionClient.projectAgentSessionPath(projectId, sessionId);
const request = {
session: sessionPath,
queryInput: {
audioConfig: {
audioEncoding: 'AUDIO_ENCODING_LINEAR_16',
sampleRateHertz: 16000,
languageCode: 'en-US',
},
},
inputAudio,
};
const [response] = await sessionClient.detectIntent(request);
console.log('Detected intent:');
console.log(response);
const result = response.queryResult;
console.log(` Query: ${result.queryText}`);
console.log(` Response: ${result.fulfillmentText}`);
The response is always
{
responseId: '',
queryResult: {
fulfillmentMessages: [],
outputContexts: [],
queryText: '',
speechRecognitionConfidence: 0,
action: '',
parameters: null,
allRequiredParamsPresent: false,
fulfillmentText: '',
webhookSource: '',
webhookPayload: null,
intent: null,
intentDetectionConfidence: 0,
diagnosticInfo: null,
languageCode: 'en-US',
sentimentAnalysisResult: null
},
webhookStatus: null,
outputAudio: <Buffer >,
outputAudioConfig: null
}
Is there a specific way to generate audio file I have to follow or what?
Thank you.
Upvotes: 1
Views: 951
Reputation: 679
I think your issue is due to the encoding and the sample rate from your audio files.
I was able to replicate the scenario and get your response
output using the samples from nodejs-dialogflow, in particular the detect.js
one, with an MP3 file running the sample like this:
node detect audio resources/book_a_room.mp3 -r 16000
When looking at the supported encodings on the sample running:
node detect audio -help
We can see that the options available are the following:
-e, --encoding The encoding of the input audio.
[choices: "AUDIO_ENCODING_LINEAR_16", "AUDIO_ENCODING_FLAC", "AUDIO_ENCODING_MULAW", "AUDIO_ENCODING_AMR",
"AUDIO_ENCODING_AMR_WB", "AUDIO_ENCODING_OGG_OPUS", "AUDIO_ENCODING_SPEEX_WITH_HEADER_BYTE"] [default:
"AUDIO_ENCODING_LINEAR_16"]
These options can also be seen on the Dialogflow API reference for AudioEncoding
. From there we can conclude that MP3 is not a supported encoding, and that's why you get that response
output.
Also when looking at the Dialogflow API reference you can see the following:
... Refer to the Cloud Speech API documentation for more details.
Looking into that documentation and going to the encoding section, you can see that:
Note: Speech-to-Text supports WAV files with LINEAR16 or MULAW encoded audio.
So going back to the encoding options from the Dialogflow API we can see that for WAV files, the ones you could use are:
It's important to notice that AUDIO_ENCODING_LINEAR_16
option is set by default.
Thus, you could use WAV files with 1 channel with the proper sample rate according to the one that your WAV file has (e.g. 44100), and then you'll get the desired response. For example:
node detect audio resources/book_a_room_1ch_16Khz.wav -r 16000
Or
node detect audio resources/book-a-room_1ch_44.1Khz.wav -r 44100
Otherwise, you'll get error messages like the following:
{ Error: 3 INVALID_ARGUMENT: Must use single channel (mono) audio, but WAV header indicates 2 channels.
at Object.callErrorFromStatus ...
And
{ Error: 3 INVALID_ARGUMENT: sample_rate_hertz (16000) in RecognitionConfig must either be omitted or match the value in the WAV header ( 44100).
at Object.callErrorFromStatus ...
Upvotes: 1