Reputation: 21

Google Cloud Speech Using Java

I am trying to communicate with Google Cloud Speech API from my Java program, but I am getting this error:

Exception in thread "main" com.google.api.gax.grpc.ApiException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Invalid Wav File: Not 16 bit Linear PCM or 8 bit MULAW.

Can anyone help me to solve this?

Upvotes: 2

Answers (2)

stlevkov

Reputation: 41

The google cloud speech API accept wav files. And not only. From the documentation: https://cloud.google.com/speech-to-text/docs/encoding

Note: Speech-to-Text supports WAV files with LINEAR16 or MULAW encoded audio.

According to the speech-to-text/ resource on the cloud web site of google speech to text, i was able to upload .wav file with 8 bit PCM SIGNED. The transcription was fine. However if i try to upload the file using their Java API, it gives this error. I tested several libraries, including the new beta. So i can agree its strange. I can play 8 bit 16KHz PCM SIGNED encoding on every player and also 8KHz and there is nothing wrong with the header of the file, i can upload it, but not via the Java API.

Here are some tips that may help: If you want to use the API, you must convert to 16 bit PCM, or 8 bit ULAW for example.

For that purpose you can use javax.sound.sampled library.

Take a look at this example: SimpleSoundCapture! You can modify for your needs. Feed the "out" variable ByteArrayOutputStream or record custom encoding wav file with ULAW encoding.

You can use this settings:

AudioFormat.Encoding encoding = AudioFormat.Encoding.ULAW; 
floatrate = 16000.0f; 
int channels = 1; 
int frameSize = 4; 
int sampleSize = 8; 
boolean bigEndian = true;

Google was able to recognize ULAW sample without a problem from the 2nd, 3th time. Sometimes even from the first. Tested with 8000.0f too. Before using ULAW, remove the setEncoding() method from the RecognitionConfig object of the speach API, because using ULAW may cause another exception.

 RecognitionConfig config
                = RecognitionConfig.newBuilder()
                //      .setEncoding(AudioEncoding.UNRECOGNIZED)
                .setLanguageCode("en-US")
                .setSampleRateHertz(16000)
                .build();

Upvotes: 4

Frauke

Reputation: 1582

The Google cloud speech API does not accept .wav files. You'll have to convert your current wav file to a headerless linear16 uncompressed file (using something like Audacity, for instance)

Also, if you're using a local file it can't be longer than one minute. Longer files need to be uploaded to a Google cloud storage bucket first.

Upvotes: 1

Google Cloud Speech Using Java

Answers (2)

Related Questions