Joaosl
Joaosl

Reputation: 21

Regex: Getting text in between time stamps

Is there a way to get the sample text in between the two timestamps in the image below?

string=[3/24/17, 8:34:00 PM] Hello [3/24/17, 8:35:22 PM] THIS TEXT [3/24/17, 8:39:07 PM] Bye [3/24/17, 8:39:19 PM]

Using the Regex = (\[.*?\](.*?)\[.*?\])

I am only able to get Hello and Bye

What can I do to get the text in between the second and third time stamps.

Upvotes: 1

Views: 386

Answers (5)

Devidas
Devidas

Reputation: 2517

Problem in your regex is python searches in linear pastern

[date]first[date2]second[date3]third[date4]

here when first is found then date1 and date2 are processed so processing will start from second hence python won't find second. As it doesn't fit in [date]text[date].

IMHO you can try one of two things

  1. (.*?\](.*?)\[.*?) search things between square bracket
  2. (\[.*?\](.*?)) search string post date.

Upvotes: 0

jignatius
jignatius

Reputation: 6494

You could use re.findall with lazy quantifier (?) to match between ] and [:

s = "[3/24/17, 8:34:00 PM] Hello [3/24/17, 8:35:22 PM] THIS TEXT [3/24/17, 8:39:07 PM] Bye [3/24/17, 8:39:19 PM]"
m = re.findall('\]\s(.*?)\s\[', s)
print(m)

Output:

['Hello', 'THIS TEXT', 'Bye']

Upvotes: 0

Todd
Todd

Reputation: 5405

Depending on how rigorous you want the matching to be, this one is a bit restrictive.

>>> regex = r"""
...         \d+:\d+:\d+\s[AP]M\]    # Match end time text.
...         \s*(.*?)\s*             # Group text between time and date, excluding spaces on each end.
...         \[\d+/\d+/\d+           # Match begin date text.
...         """
>>> 
>>> re.findall(regex, string, flags=re.VERBOSE)
['Hello', 'THIS TEXT', 'Bye']

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522030

One approach is to use re.split with the regex pattern \s*\[.*?\]\s*, to split the input on the timestamps, leaving behind the text you want to match, as separate entries in a list. I also filter off empty string elements, to deal with edge cases where the string may start or end with timestamp (which would generate an empty string match on the left/right).

string = "[3/24/17, 8:34:00 PM] Hello [3/24/17, 8:35:22 PM] THIS TEXT [3/24/17, 8:39:07 PM] Bye [3/24/17, 8:39:19 PM]"
parts = re.split(r'\s*\[.*?\]\s*', string)
parts = filter(None, parts)
print(parts)

This prints:

['Hello', 'THIS TEXT', 'Bye']

Upvotes: 0

alec
alec

Reputation: 6112

You can match letters with [A-Z]:

string = '[3/24/17, 8:34:00 PM] Hello [3/24/17, 8:35:22 PM] THIS TEXT [3/24/17, 8:39:07 PM] Bye [3/24/17, 8:39:19 PM]'
print(re.findall(' ([a-z A-Z]+) ', string))
# ['Hello', 'THIS TEXT', 'Bye']

Upvotes: 0

Related Questions