user11727742
user11727742

Reputation:

How to extract from text with in a range of time

I have a text below, How to extract the text between the time range. Code is available to extract all the values

s = '''00:00:14,099 --> 00:00:19,100
a classic math problem a

00:00:17,039 --> 00:00:28,470
will come from an unexpected place

00:00:18,039 --> 00:00:19,470

00:00:20,039 --> 00:00:21,470

00:00:22,100 --> 00:00:30,119
binary numbers first I'm going to give

00:00:30,119 --> 00:00:35,430
puzzle and then you can try to solve it

00:00:32,489 --> 00:00:37,170
like I said you have a thousand bottles'''

Can i extract the test from 00:00:17,039 --> 00:00:28,470 and 00:00:30,119

code to write back all the values

import re
lines = s.split('\n')
dict = {}

for line in lines:
    is_key_match_obj = re.search('([\d\:\,]{12})(\s-->\s)([\d\:\,]{12})', line)
    if is_key_match_obj:
        #current_key = is_key_match_obj.group()
        print (current_key)
        continue

    if current_key:
        if current_key in dict:
            if not line:
                dict[current_key] += '\n'
            else:
                dict[current_key] += line
        else:
              dict[current_key] = line

print(dict.values())

Expected Out from 00:00:17,039 --> 00:00:28,470 to 00:00:30,119 --> 00:00:35,430

dict_values(['will come from an unexpected place ', '', '', 'binary numbers first I'm going to give', ' puzzle and then you can try to solve it'])

Upvotes: 2

Views: 90

Answers (2)

bhaskarc
bhaskarc

Reputation: 9521

import re
line = re.sub(r'\d{2}[:,\d]+[ \n](-->)*', "", s)
print(line)

will print:

" a classic math problem a\n\n will come from an unexpected place\n\n \n \n binary numbers first I'm going to give\n\n puzzle and then you can try to solve it\n\n like I said you have a thousand bottles"

Explanation

'\d{2}[:,\d] capture two digits numbers followed by : or , or a number - this captures both start and end timelines

[ \n] : captures an empty space after the first timeline and line break after the end timeline

(-->)* : captures the occurrence of 0 or more -->

As some else suggested in the comment, you might want to look at parser that do this for you by building a parse tree. They are more full-proof. Google search leads me to this srt python library

Upvotes: -1

Karmveer Singh
Karmveer Singh

Reputation: 953

No need to iterate line by line. Try the below code. It will give you a dictionary as you wanted.

import re
dict = dict(re.findall('(\d{2}:\d{2}.*)\n(.*)', s))
print(dict.values())

Output

dict_values(['a classic math problem a', 'will come from an unexpected place', '', '', "binary numbers first I'm going to give", 'puzzle and then you can try to solve it', 'like I said you have a thousand bottles'])

Upvotes: 2

Related Questions