Reputation: 441
distil-large-v3#sequential-long-form
I'm using distil-whisper
through the 🤗 Transformers pipeline for speech recognition. When setting return_timestamps=True
, the timestamps reset to 0 every 30 seconds instead of continuing to increment throughout the entire audio file.
Here's my current code:
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
max_new_tokens=128,
torch_dtype=torch_dtype,
device=device,
return_timestamps=True,
)
result = pipe("audio.mp4")
The timestamps in the output look like this:
{'chunks': [
{'text': 'First segment', 'timestamp': (0.0, 5.2)},
{'text': 'Second segment', 'timestamp': (5.2, 12.8)},
{'text': 'Later segment', 'timestamp': (28.4, 30.0)},
{'text': 'Should be ~35s but shows', 'timestamp': (0.0, 4.6)}, # Resets here!
...
]}
I expect the timestamps to continue incrementing past 30 seconds, like this:
{'chunks': [
{'text': 'First segment', 'timestamp': (0.0, 5.2)},
{'text': 'Second segment', 'timestamp': (5.2, 12.8)},
{'text': 'Later segment', 'timestamp': (28.4, 30.0)},
{'text': 'Continues properly', 'timestamp': (30.0, 34.6)}, # Should continue
...
]}
How can I fix this timestamp reset issue? Is there a way to make the timestamps continue incrementing throughout the entire audio file?
Upvotes: 2
Views: 92