Reputation: 41
When using the google cloud speech api, the new word accurate timestamps/timecode feature, seem to allow 0
seconds duration for some words in results, here is an example
...
{ startTime: '48.800s', endTime: '48.800s', word: 'a' },
{ startTime: '48.800s', endTime: '49.200s', word: 'kindly' },
...
is this a bug?
To test I used a clip from audio archive "Arthur the Rat", "USA - General mid-western speaker (Michigan)".
Upvotes: 4
Views: 168
Reputation: 484
David Anderson's answer is correct, I just thought I'd elaborate it as I initially thought the response is only to the second precision and not 100ms as the docs describe.
As of July 2018, sending a request to the google cloud speech API including word time offsets returns a response object where each word result in response.results
has the structure:
start_time {
seconds: 24
nanos: 100000000
}
end_time {
seconds: 24
nanos: 700000000
}
word: "of"
The nanos
field allows you to get the start and end time to the 100ms precision. So you can obtain the start and end times like so:
print(start_time.seconds + start_time.nanos * 1e-9)
print(end_time.seconds + end_time.nanos * 1e-9)
==== Output ====
24.1
24.7
Upvotes: 1
Reputation: 11
you can get better than second precision using the returned timestamp.
you get the start time out of the structure containing the word and you can output it in the following way:
start_time.seconds + start_time.nanos * 1e-9
Upvotes: 1