Victor Wang
Victor Wang

Reputation: 937

Why my regex does not return group(0) properly?

I want to find the dates from a large number of files. The date is on a single line, and is in the format of "21 September 2010". There is only one such date in each file.

The following codes return the month only, for example, "September". Why group(0) does not give me the whole thing like "21 September 2010"? What is missing here? Thank you!

months = ("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")

pattern = r"^\d{2} +" + "|".join(months) + r" +\d{4}$"
match = re.search(pattern, text)
if match:
    fdate = match.group(0)

Upvotes: 1

Views: 66

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627100

When you print your regex, you will see it looks like ^\d{2} +January|February|March|April|May|June|July|August|September|October|November|December +\d{4}$. When you apply it to 21 September 2010, you will see that it matches September because the ^\d{2} + can only be matched with January at the start of the string since the month alternatives are not grouped.

You need to group the month alternatives:

pattern = r"^\d{{2}} +(?:{}) +\d{{4}}$".format("|".join(months))

See the Python demo:

import re
text = "21 September 2010"
months = ("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")
pattern = r"^\d{{2}} +(?:{}) +\d{{4}}$".format("|".join(months))
match = re.search(pattern, text)
if match:
    fdate = match.group(0)
    print(fdate) # => 21 September 2010

Upvotes: 2

Related Questions