Reputation: 227
I am trying to generate captions for any audio that is being played on the tab ... for this I am building a chrome extension which is able to capture audio from the current open tab and in real-time produce text from the audio stream!
After some research, I found out, chrome has a tabCapture
API that can capture audio stream from the current tab ... but then the problem is how will I continuously convert the stream I get from the API into text !
chrome.tabCapture.capture({audio: true}, (stream) => {
let startTabId;
chrome.tabs.query({active:true, currentWindow: true}, (tabs) => startTabId = tabs[0].id)
const liveStream = stream;
const audioCtx = new AudioContext();
const source = audioCtx.createMediaStreamSource(stream);
let mediaRecorder = new Recorder(source);
This how the audio would be recorded ... the stream
object supposedly contains the audio information ... I am not sure how to use the stream
object to be able to convert it into text!
Upvotes: 3
Views: 796
Reputation:
What you are asking for is a speech recognition engine. There is no straightforward way of implementing this feature, especially not in a browser context. It's not even clear that this is even feasible given the current state of the art.
Speech recognition is a wide field of ongoing research; what you are trying to do here is not a solved problem. Even major industry forces like Google have not solved this problem: Youtube has a feature which can automatically generate captions for videos, but the resulting captions are awful. And their implementation of this feature depends on a large machine learning effort; it is unlikely that you'd be able to implement anything of even this quality in Javascript, to run in real time in a web browser.
Upvotes: 1