Armen Sanoyan
Armen Sanoyan

Reputation: 2042

Get lyrics of song by google speech to text

In my Nodejs server am using Google's speech to text API to get the lyrics of song, but it doesn't seem to work well with music. I loose most part of words, so my question is, does this api work with songs too, or there shouldn't be any noise in background of audio for API to work well?

I am using code from google docs and my configs look like this

const config = {
  encoding: 'LINEAR16',
  sampleRateHertz: 16000,
  languageCode: 'en-US',
  enableWordTimeOffsets: true,
}

I did get bad results(lost the majority of the words) also in case of using enableWordTimeOffsets: false. So I am not using if I'm using tool to get lyrics from song. Here is the code in case the problem is in my code

async function transcribeGCS() {
  // The GCS URI to your audio file
  const gcsUri = 'gs://globally-unique-speech/10sec.wav';

  const audio = {
    uri: gcsUri,
  };

  const config = {
    encoding: 'LINEAR16',
    sampleRateHertz: 16000,
    languageCode: 'en-US',
    enableWordTimeOffsets: true,
  };

  const request = {
    audio: audio,
    config: config,
  };

  // Detects speech in the audio file
  const [operation] = await client.longRunningRecognize(request);
  const [response] = await operation.promise();

  response.results.forEach(result => {
    const alternative = result.alternatives[0];
    console.log(`Transcription: ${alternative.transcript}\n`);

    alternative.words.forEach(wordInfo => {
      // NOTE: If you have a time offset exceeding 2^32 seconds, use the
      // wordInfo.startTime.seconds.high and wordInfo.startTime.seconds.low properties
      const startSecs =
        `${wordInfo.startTime.seconds}` +
        `.` +
        wordInfo.startTime.nanos / 100000000;

      const endSecs =
        `${wordInfo.endTime.seconds}` +
        `.` +
        wordInfo.endTime.nanos / 100000000;

      console.log(`Word: ${wordInfo.word}`);
      console.log(`\tStart Time: ${startSecs}s`);
      console.log(`\tEnd Time: ${endSecs}s`);
    });
  });
}

I also posted thisquestion which is close to this topic, but the difference is that Here I'm trying to understand if it's the right tool to use for my case.

Upvotes: 0

Views: 128

Answers (0)

Related Questions