Reputation: 1
I am currently facing an issue while trying to capture and store synthesized speech from FreeTTS as a byte array. The goal is to take the transcribed text, generate an audio output using FreeTTS, and store the resulting sound data in memory for further processing, such as sending it via WebSocket.
public static byte[] speakWithWavHeader(String text) throws InstantiationException {
System.setProperty("freetts.voices",
"com.sun.speech.freetts.en.us.cmu_us_kal.KevinVoiceDirectory");
Voice voice = VoiceManager.getInstance().getVoice(VOICE_NAME);
if (voice == null) {
System.err.println("Erro: Voz não encontrada.");
return null;
}
AudioPlayer player = voice.getDefaultAudioPlayer();
ByteArrayAudioPlayer rawAudioPlayer = new ByteArrayAudioPlayer();
voice.setAudioPlayer(rawAudioPlayer);
voice.allocate();
voice.speak(text);
voice.deallocate();
byte[] rawAudioData = rawAudioPlayer.getAudioBytes();
AudioFormat format = new AudioFormat(16000, 16, 1, true, false);
ByteArrayInputStream bais = new ByteArrayInputStream(rawAudioData);
AudioInputStream audioInputStream = new AudioInputStream(
bais, format, rawAudioData.length / format.getFrameSize());
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
AudioSystem.write(audioInputStream, AudioFileFormat.Type.WAVE, baos);
} catch (IOException e) {
e.printStackTrace();
return null;
}
return baos.toByteArray();
However, the captured audio is consistently distorted when played back. The problem appears to stem from the way the audio is being captured. I am using a custom AudioPlayer (ByteArrayAudioPlayer) to intercept the synthesized audio data and store it in a ByteArrayOutputStream. Then, I attempt to wrap it as a WAV file with a proper header using AudioInputStream and AudioSystem.write().
public class ByteArrayAudioPlayer implements AudioPlayer {
private ByteArrayOutputStream byteOutputStream;
private AudioFormat audioFormat;
public ByteArrayAudioPlayer() {
// Exemplo: 16kHz, 16 bits, mono, PCM_SIGNED, little-endian
this.audioFormat = new AudioFormat(16000, 16, 1, true, false);
this.byteOutputStream = new ByteArrayOutputStream();
}
@Override
public void begin(int size) {
byteOutputStream.reset();
}
@Override
public boolean end() {
return true;
}
@Override
public void cancel() {
}
@Override
public void close() {
try {
byteOutputStream.close();
} catch (Exception e) {
e.printStackTrace();
}
}
@Override
public float getVolume() {
return 0;
}
@Override
public void setVolume(float v) {
}
@Override
public long getTime() {
return 0;
}
@Override
public void resetTime() {
}
@Override
public AudioFormat getAudioFormat() {
return audioFormat;
}
@Override
public void pause() {}
@Override
public void resume() {}
@Override
public void reset() {
byteOutputStream.reset();
}
@Override
public boolean drain() {
return true;
}
@Override
public void startFirstSampleTimer() {}
@Override
public boolean write(byte[] audioData) {
try {
byteOutputStream.write(audioData);
return true;
} catch (Exception e) {
e.printStackTrace();
return false;
}
}
@Override
public boolean write(byte[] audioData, int offset, int length) {
try {
byteOutputStream.write(audioData, offset, length);
return true;
} catch (Exception e) {
e.printStackTrace();
return false;
}
}
@Override
public void showMetrics() {
}
public byte[] getAudioBytes() {
return byteOutputStream.toByteArray();
}
public void setAudioFormat(AudioFormat format) {
this.audioFormat = format;
}
Despite these efforts, the resulting WAV file does not sound correct. It seems like either the byte data is being misinterpreted, or the audio format parameters (sample rate, bit depth, channels, endianness) do not match the actual FreeTTS output. Even after explicitly setting the AudioFormat (e.g., 16kHz, 16-bit, mono, PCM_SIGNED, little-endian), the issue persists.
I need to determine whether the distortion is caused by the way FreeTTS outputs raw audio, how the data is captured in the ByteArrayAudioPlayer, or how the WAV header is being applied. Alternatively, I am open to using another TTS library if FreeTTS does not provide a reliable way to capture properly formatted audio.
I've already tried to set different audio formats manually but nothing works
Upvotes: 0
Views: 17