Reputation: 567
I'm making a speech-to-text tool. I'm capturing audio in real time (using Web audio api from Chrome) and sending it to a server to convert the audio to text.
I'd like to extract pieces of the whole audio cause I only want to send sentences, avoiding silences. (cause the api I use has a cost). The problem is that I don't know how to convert the whole audio into pieces.
I was using MediaRecorder
to capture the audio
// recording
this.recorder = new MediaRecorder(stream)
this.recorder.ondataavailable = async (e) => {
const buffer = await e.data.arrayBuffer()
this.chunks.add(new Uint8Array(buffer))
}
this.recorder.start(1000)
Now I have in this.chunks
I have an array of buffers indexed by second.
If I try to reproduce the whole audio file by passing all captured buffer, it is able to decode it and reproduce it correctly:
// reproduce the whole audio: <- this works
const combinedChunks = this.chunks.reduce((prev, chunk) => [...prev,...chunk], [])
const arrChunks = new Uint8Array(combinedChunks)
this.repAudioContext = new AudioContext()
this.repAudioBuffer = await this.repAudioContext.decodeAudioData(
arrChunks.buffer
)
this.repSourceNode = this.repAudioContext.createBufferSource()
this.repSourceNode.buffer = this.repAudioBuffer
this.repSourceNode.connect(this.repAudioContext.destination)
this.repSourceNode.start()
That works ^, because I'm using all of the pieces.
But since I want to extract pieces of the audio, I want to be able to select only the buffer pieces I want to reproduce. And I can't do that. If I extract the first piece of audio, it stops working and I get: decodeAudioData - Unable to decode audio data
.
// reproduce a part of the audio captured: <- this won't work
const combinedChunks = this.chunks.slice(1).reduce((prev, chunk) => [...prev,...chunk], []) // <- skipping first chunk
const arrChunks = new Uint8Array(combinedChunks)
this.repAudioContext = new AudioContext()
this.repAudioBuffer = await this.repAudioContext.decodeAudioData(
arrChunks.buffer
)
this.repSourceNode = this.repAudioContext.createBufferSource()
this.repSourceNode.buffer = this.repAudioBuffer
this.repSourceNode.connect(this.repAudioContext.destination)
this.repSourceNode.start()
I understand this might be because in the first chunk there are headers or other metadata of the captured audio. But can't find a way of doing this.
Can anyone give me some advice? is there a different api I should be using? What's the proper way of extracting a smaller piece of audio from a larger one that I can reproduce and save as a file?
Upvotes: 3
Views: 735
Reputation: 567
I've found the answer to my own question, I was using the wrong approach.
What I need to use to get the raw audio inputs and be able to manipulate them is the AudioWorkletProcessor.
This video helped me to understand the theory behind:
https://www.youtube.com/watch?v=g1L4O1smMC0
And this article helped me understand how to make use of it: https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API/Using_AudioWorklet
Upvotes: 2