Reputation:
I have a text below, How to extract the text between the time range. Code is available to extract all the values
s = '''00:00:14,099 --> 00:00:19,100
a classic math problem a
00:00:17,039 --> 00:00:28,470
will come from an unexpected place
00:00:18,039 --> 00:00:19,470
00:00:20,039 --> 00:00:21,470
00:00:22,100 --> 00:00:30,119
binary numbers first I'm going to give
00:00:30,119 --> 00:00:35,430
puzzle and then you can try to solve it
00:00:32,489 --> 00:00:37,170
like I said you have a thousand bottles'''
Can i extract the test from 00:00:17,039 --> 00:00:28,470
and 00:00:30,119
code to write back all the values
import re
lines = s.split('\n')
dict = {}
for line in lines:
is_key_match_obj = re.search('([\d\:\,]{12})(\s-->\s)([\d\:\,]{12})', line)
if is_key_match_obj:
#current_key = is_key_match_obj.group()
print (current_key)
continue
if current_key:
if current_key in dict:
if not line:
dict[current_key] += '\n'
else:
dict[current_key] += line
else:
dict[current_key] = line
print(dict.values())
Expected Out from 00:00:17,039 --> 00:00:28,470
to 00:00:30,119 --> 00:00:35,430
dict_values(['will come from an unexpected place ', '', '', 'binary numbers first I'm going to give', ' puzzle and then you can try to solve it'])
Upvotes: 2
Views: 90
Reputation: 9521
import re
line = re.sub(r'\d{2}[:,\d]+[ \n](-->)*', "", s)
print(line)
will print:
" a classic math problem a\n\n will come from an unexpected place\n\n \n \n binary numbers first I'm going to give\n\n puzzle and then you can try to solve it\n\n like I said you have a thousand bottles"
Explanation
'\d{2}[:,\d]
capture two digits numbers followed by :
or ,
or a number - this captures both start and end timelines
[ \n]
: captures an empty space after the first timeline and line break after the end timeline
(-->)*
: captures the occurrence of 0 or more -->
As some else suggested in the comment, you might want to look at parser that do this for you by building a parse tree. They are more full-proof. Google search leads me to this srt python library
Upvotes: -1
Reputation: 953
No need to iterate line by line. Try the below code. It will give you a dictionary as you wanted.
import re
dict = dict(re.findall('(\d{2}:\d{2}.*)\n(.*)', s))
print(dict.values())
Output
dict_values(['a classic math problem a', 'will come from an unexpected place', '', '', "binary numbers first I'm going to give", 'puzzle and then you can try to solve it', 'like I said you have a thousand bottles'])
Upvotes: 2