Martin
Martin

Reputation: 51

Increase volume from Google's Text-to-Speech (WAV Audio processing, React Native)

I am using Google's Text to Speech API on my backend. I want to play the resulting audio in my Expo application, but I have a low volume problem on physical devices (but not on simulators).

Right now, I am generating the file using the LINEAR16 encoding in my AudioConfig, and I send it to my app through websockets. According to the documentation, this data contains a WAV Header : https://cloud.google.com/text-to-speech/docs/reference/rest/v1/AudioConfig

I then parse the data on the frontend side using this function :

const arrayBufferToBase64 = (buffer: ArrayBuffer): string => {
    const binary = String.fromCharCode(...new Uint8Array(buffer));
    return Buffer.from(binary, 'binary').toString('base64');
};

And I save the file as a .wav file before playing it with react-native-track-player.

const fileName = `speech_${new Date().getTime()}.wav`
const uniqueFilename = `${FileSystem.cacheDirectory}${fileName}`;    

await FileSystem.writeAsStringAsync(uniqueFilename, base64AudioEncodedString, {
  encoding: FileSystem.EncodingType.Base64
});

const track = {
  url: uniqueFilename,
  title: fileName,
  artist: senderId
};

await TrackPlayer.add(track)

What I tried :

  1. I added the volumeGainDb field for the TTS. But it didn't increase the volume that much.

  2. I tried to modify the volume through react-native-track-player, but it was already set to the maximum (1.0)

  3. I had the idea to modify the byte of the array manually by multiplying each byte. But my lack of knowledge in audio processing (especially in handling the WAV header) led me to bad results. I would be really interested in some help for modifying the audio data directly in my arrayBufferToBase64 function.

Thank you very much for your help. Any insight is welcomed.

Upvotes: 1

Views: 130

Answers (0)

Related Questions