Reputation: 186
I am trying to print all existing timestamps from a transcript (.txt file).
A short extract from the transcript:
36
00:01:36,990 --> 00:01:39,119
Text...
37
00:01:39,119 --> 00:01:41,759
Text...
38
00:01:41,759 --> 00:01:43,799
Text...
My code looks like this so far:
import re
timestamps = []
linenum = 0
pattern = re.compile(r"\d{2}:\d{2}:\d{2},\d{3}\s-->\s\d{2}:\d{2}:\d{2},\d{3}")
for line in transcript:
linenum += 1
if pattern.search(line) != None:
timestamps.append(linenum, line.rstrip('\n'))
print(timestamps)
The output is... nothing. No error or anything else. But I wish to print out all lines that contain timestamps.
I don't know what's wrong with the code or how to fix that. Can anyone please help? It'd be much appreciated.
Thank you!
Upvotes: 2
Views: 80
Reputation: 627082
You need to append either tuples or lists to the timestamps
list.
import re
timestamps = []
linenum = 0
pattern = re.compile(r"\d{2}:\d{2}:\d{2},\d{3}\s-->\s\d{2}:\d{2}:\d{2},\d{3}")
for line in transcript:
linenum += 1
if pattern.search(line):
timestamps.append((linenum, line.rstrip('\n')))
print(timestamps)
See the Python demo.
With the input like
12:12:12,234 --> 12:13:46,346
Blah
12:14:12,121 --> 12:15:89,678
Blah2
The output is
[(1, '12:12:12,234 --> 12:13:46,346'), (3, '12:14:12,121 --> 12:15:89,678')]
Upvotes: 1