Reputation: 3
I'm using Google's Speech-to-Text API to convert an audio file into text. It can identify speakers, which is really cool, but it formats the info in a way that I am having some trouble with. Here are their docs on separating out speakers.
My goal is to have a single string separating out lines by their speakers, something like this:
Speaker1: Hello Tom
Speaker2: Howdy
Speaker1: How was your weekend
If I send an audio file to get transcribed, I get back something like this:
wordsObjects =
[
{
startTime: { seconds: '1'},
endTime: { seconds: '1'},
word: 'Hello',
speakerTag: 1
},
{
startTime: { seconds: '2'},
endTime: { seconds: '2'},
word: 'Tom',
speakerTag: 1
},
]
Of course there's an object for each word, I just want to save space. Anything Tom says in this example should be represented by speakerTag: 2
Here's the closest I've gotten so far:
const unformattedTranscript = wordsObjects.map((currentWord, idx, arr) => {
if (arr[idx + 1]) {
if (currentWord.speakerTag === arr[idx + 1].speakerTag) {
return [currentWord.word, arr[idx + 1].word];
} else {
return ["SPEAKER CHANGE"];
}
}
});
const formattedTranscript = unformattedTranscript.reduce(
(acc, wordArr, idx, arr) => {
if (arr[idx + 1]) {
if (wordArr[wordArr.length - 1] === arr[idx + 1][0]) {
wordArr.pop();
acc.push(wordArr.concat(arr[idx + 1]));
} else {
acc.push(["\n"]);
}
}
return acc;
},
[]
);
This solution does not work if a speaker says more than two words consecutively. I've managed to confuse myself thoroughly on this one, so I'd love to be nudged in the right direction.
Thanks in advance for any advice.
Upvotes: 0
Views: 193
Reputation: 21110
You could add a chunkWhile
generator function. Chunk the items as long as the speaker tag is the same, then convert each chunk into a line.
function* chunkWhile(iterable, fn) {
const iterator = iterable[Symbol.iterator]();
let {done, value: valueA} = iterator.next();
if (done) return;
let chunk = Array.of(valueA);
for (const valueB of iterator) {
if (fn(valueA, valueB)) {
chunk.push(valueB);
} else {
yield chunk;
chunk = Array.of(valueB);
}
valueA = valueB;
}
yield chunk;
}
const wordsObjects = [
{ word: 'Hello' , speakerTag: 1 },
{ word: 'Tom' , speakerTag: 1 },
{ word: 'Howdy' , speakerTag: 2 },
{ word: 'How' , speakerTag: 1 },
{ word: 'was' , speakerTag: 1 },
{ word: 'your' , speakerTag: 1 },
{ word: 'weekend', speakerTag: 1 },
];
const chunkGenerator = chunkWhile(
wordsObjects,
(a, b) => a.speakerTag == b.speakerTag,
);
let string = "";
for (const wordsObjects of chunkGenerator) {
const speakerTag = wordsObjects[0].speakerTag;
const words = wordsObjects.map(({word}) => word).join(" ");
string += `Speaker${speakerTag}: ${words}\n`;
}
console.log(string);
If you ever need to convert a generator to an array you can do Array.from(generator)
or [...generator]
.
Upvotes: 1
Reputation: 329
That's how i would do it using a reducer:
const formattedTranscript = wordsObjects.reduce((accumulator, currentValue) => {
// check if same speaker (continue on the same line)
if(accumulator.length > 0)
{
const lastItem = accumulator[accumulator.length -1];
if(lastItem.speakerTag === currentValue.speakerTag) {
lastItem.text += " " + currentValue.word;
return accumulator;
}
}
// new line (new speaker)
accumulator.push({
speakerTag: currentValue.speakerTag,
text: currentValue.word
});
return accumulator;
}, []);
Upvotes: 0
Reputation: 1234
I think you're overcomplicating things. You can simply iterate over words array and track current speaker tag. Whenever current word speaker tag changes you can add a new line (and if it didn't change - append current word to the current line). Here's an example:
const stringifyDialog = (words) => {
let currSpeakerTag // number | undefined
let lines = [] // Array<[number, string]>, where number is speaker tag and string is the line
for (let {speakerTag, word} of words) {
if (speakerTag !== currSpeakerTag) {
currSpeakerTag = speakerTag
lines.push([speakerTag, word])
} else {
lines[lines.length - 1][1] += ` ${word}`
}
}
return lines.map(([speakerTag, line]) => `Speaker${speakerTag}: ${line}`).join('\n')
}
Given input
const wordsObjects =
[
{
word: 'Hello',
speakerTag: 1
},
{
word: 'Tom',
speakerTag: 1
},
{
word: 'Howdy',
speakerTag: 2
},
{
word: 'How',
speakerTag: 1
},
{
word: 'was',
speakerTag: 1
},
{
word: 'your',
speakerTag: 1
},
{
word: 'weekend',
speakerTag: 1
},
]
this will produce
"Speaker1: Hello Tom
Speaker2: Howdy
Speaker1: How was your weekend"
Upvotes: 0