How to use the audio stream from chrome's tabCapture API and convert to text

I am trying to generate captions for any audio that is being played on the tab ... for this I am building a chrome extension which is able to capture audio from the current open tab and in real-time produce text from the audio stream!

After some research, I found out, chrome has a tabCapture API that can capture audio stream from the current tab ... but then the problem is how will I continuously convert the stream I get from the API into text !

chrome.tabCapture.capture({audio: true}, (stream) => {
  let startTabId;
  chrome.tabs.query({active:true, currentWindow: true}, (tabs) => startTabId = tabs[0].id)
  const liveStream = stream;
  const audioCtx = new AudioContext();
  const source = audioCtx.createMediaStreamSource(stream);
  let mediaRecorder = new Recorder(source);

This how the audio would be recorded ... the stream object supposedly contains the audio information ... I am not sure how to use the stream object to be able to convert it into text!

Upvotes: 3

Views: 796

Answers (1)

user149341
user149341

Reputation:

What you are asking for is a speech recognition engine. There is no straightforward way of implementing this feature, especially not in a browser context. It's not even clear that this is even feasible given the current state of the art.

Speech recognition is a wide field of ongoing research; what you are trying to do here is not a solved problem. Even major industry forces like Google have not solved this problem: Youtube has a feature which can automatically generate captions for videos, but the resulting captions are awful. And their implementation of this feature depends on a large machine learning effort; it is unlikely that you'd be able to implement anything of even this quality in Javascript, to run in real time in a web browser.

Upvotes: 1

Related Questions