Reputation: 12826
I have a set of inputs. I am trying to write a regex to match the following pattern in the input:
Day at Time on location
Example input:
Today at 12:30 PM on Sam's living room
The bolded part of the text varies in each input.
I wrote the following regex:
import regex as re
input_example = "Today at 12:30 PM on Rakesh's Echo"
regexp_1 = re.compile(r'(\w+) at (\d+):(\d+) (\w+) on (\w+)')
re_match = regexp_1.match(input_example)
Which works, I am matching the correct patterns. I am now trying to extract groups from within the pattern.
My desired output is:
re_match.group(1)
>> "Today"
re_match.group(2)
>> "12:30 PM"
re_match.group(3)
>> "Sam's living room"
However, my current regular expression match does not give me this output. What is the correct regex that will give me the above outputs?
Upvotes: 13
Views: 37892
Reputation: 6175
I think you want re.compile(r'(\w+) at (\d+:\d+ \w+) on (.+)')
instead.
Your second group needs to capture the whole time (two numbers and a word) and your third group needs to accept more than just \w
if you want to get apostrophes, etc. I'm suggesting .+
which will just get everything to the end of the line.
I've tried this and get:
Today
12:30 PM
Rakesh's Echo
Upvotes: 1
Reputation: 615
You are pretty close. You just want to adjust your capture groups a bit to look like...
re.compile(r"(\w+) at (\d+:\d+ \w+) on (.+)")
Note the second capture group will now match the full hour:minute period-of-day
. The final capture group (\w+)
will match a-z
, A-Z
, 0-9
and _
, but not '
causing you to only capture a small bit of the description. The change to .+
allows it to match any character. If you know only a few characters outside of \w
need to be matched you can do [\w']+
with whatever other characters you need included.
A good tool to play with and test your regex is https://regex101.com/ just make sure you select the python language.
Upvotes: 7
Reputation: 5914
You can make nested groups, but in that way it would be not very readable, because you have to compute the exact number of the group and then you will forget what exactly means that number.
It's better to use named groups. This is copied from the REPL:
>>> import re
...
... input_example = "Today at 12:30 PM on Rakesh's Echo"
... regexp_1 = re.compile(r'(?P<day>\w+) at (?P<time>(\d+):(\d+) (\w+)) on (?P<place>\w+)')
... re_match = regexp_1.match(input_example)
>>> list(re_match.groups())
['Today', '12:30 PM', '12', '30', 'PM', 'Rakesh']
>>> re_match.group('day')
'Today'
>>> re_match.group('time')
'12:30 PM'
>>> re_match.group('place')
'Rakesh'
Upvotes: 13