Reputation: 79
I need to extract the date in format of: dd Month yyyy (20 August 2013). I tried the following regex:
\d{2} (January|February|March|April|May|June|July|August|September|October|November|December) \d{4}
It works with regex testers (chcked with several the text - Monday, 19 August 2013), but It seems that Python doesn't understand it. The output I get is:
>>>
['August']
>>>
Can somebody please understand me why is that happening ?
Thank you !
Upvotes: 2
Views: 82
Reputation: 179422
Did you use re.findall
? By default, if there's at least one capture group in the pattern, re.findall
will return only the captured parts of the expression.
You can avoid this by removing every capture group, causing re.findall
to return the entire match:
\d{2} (?:January|February|...|December) \d{4}
or by making a single big capture group:
(\d{2} (?:January|February|...|December) \d{4})
or, possibly more conveniently, by making every component a capture group:
(\d{2}) (January|February|...|December) (\d{4})
This latter form is more useful if you will need to process the individual day/month/year components.
Upvotes: 3
Reputation: 19066
It looks like you are only getting the data from the capture group, try this:
(\d{2} (?:January|February|March|April|May|June|July|August|September|October|November|December) \d{4})
I put a capture group around the entire thing and made the month a non-capture group. Now whatever was giving you "August" should give you the entire thing.
I just looked at some python regex stuff here
>>> p = re.compile('(a(b)c)d')
>>> m = p.match('abcd')
>>> m.group(0)
'abcd'
>>> m.group(1)
'abc'
>>> m.group(2)
'b'
Seeing this, I'm guessing (since you didn't show how you were actually using this regex) that you were doing group(1)
which will now work with the regex I supplied above.
It also looks like you could have used group(0)
to get the whole thing (if I am correct in the assumption that this is what you were doing). This would work in your original regex as well as my modified version.
Upvotes: 2