Reputation: 21
Is there a way to get the sample text in between the two timestamps in the image below?
string=[3/24/17, 8:34:00 PM] Hello [3/24/17, 8:35:22 PM] THIS TEXT [3/24/17, 8:39:07 PM] Bye [3/24/17, 8:39:19 PM]
Using the Regex = (\[.*?\](.*?)\[.*?\])
I am only able to get Hello
and Bye
What can I do to get the text in between the second and third time stamps.
Upvotes: 1
Views: 386
Reputation: 2517
Problem in your regex is python searches in linear pastern
[date]first[date2]second[date3]third[date4]
here when first
is found then date1 and date2 are processed
so processing will start from second
hence python won't find second
.
As it doesn't fit in [date]text[date]
.
IMHO you can try one of two things
(.*?\](.*?)\[.*?)
search things between square bracket(\[.*?\](.*?))
search string post date.Upvotes: 0
Reputation: 6494
You could use re.findall
with lazy quantifier (?) to match between ] and [:
s = "[3/24/17, 8:34:00 PM] Hello [3/24/17, 8:35:22 PM] THIS TEXT [3/24/17, 8:39:07 PM] Bye [3/24/17, 8:39:19 PM]"
m = re.findall('\]\s(.*?)\s\[', s)
print(m)
Output:
['Hello', 'THIS TEXT', 'Bye']
Upvotes: 0
Reputation: 5405
Depending on how rigorous you want the matching to be, this one is a bit restrictive.
>>> regex = r"""
... \d+:\d+:\d+\s[AP]M\] # Match end time text.
... \s*(.*?)\s* # Group text between time and date, excluding spaces on each end.
... \[\d+/\d+/\d+ # Match begin date text.
... """
>>>
>>> re.findall(regex, string, flags=re.VERBOSE)
['Hello', 'THIS TEXT', 'Bye']
Upvotes: 1
Reputation: 522030
One approach is to use re.split
with the regex pattern \s*\[.*?\]\s*
, to split the input on the timestamps, leaving behind the text you want to match, as separate entries in a list. I also filter off empty string elements, to deal with edge cases where the string may start or end with timestamp (which would generate an empty string match on the left/right).
string = "[3/24/17, 8:34:00 PM] Hello [3/24/17, 8:35:22 PM] THIS TEXT [3/24/17, 8:39:07 PM] Bye [3/24/17, 8:39:19 PM]"
parts = re.split(r'\s*\[.*?\]\s*', string)
parts = filter(None, parts)
print(parts)
This prints:
['Hello', 'THIS TEXT', 'Bye']
Upvotes: 0
Reputation: 6112
You can match letters with [A-Z]:
string = '[3/24/17, 8:34:00 PM] Hello [3/24/17, 8:35:22 PM] THIS TEXT [3/24/17, 8:39:07 PM] Bye [3/24/17, 8:39:19 PM]'
print(re.findall(' ([a-z A-Z]+) ', string))
# ['Hello', 'THIS TEXT', 'Bye']
Upvotes: 0