Reputation: 937
I want to find the dates from a large number of files. The date is on a single line, and is in the format of "21 September 2010"
. There is only one such date in each file.
The following codes return the month only, for example, "September"
. Why group(0) does not give me the whole thing like "21 September 2010"
?
What is missing here? Thank you!
months = ("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")
pattern = r"^\d{2} +" + "|".join(months) + r" +\d{4}$"
match = re.search(pattern, text)
if match:
fdate = match.group(0)
Upvotes: 1
Views: 66
Reputation: 627100
When you print your regex, you will see it looks like ^\d{2} +January|February|March|April|May|June|July|August|September|October|November|December +\d{4}$
. When you apply it to 21 September 2010
, you will see that it matches September
because the ^\d{2} +
can only be matched with January
at the start of the string since the month alternatives are not grouped.
You need to group the month alternatives:
pattern = r"^\d{{2}} +(?:{}) +\d{{4}}$".format("|".join(months))
See the Python demo:
import re
text = "21 September 2010"
months = ("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")
pattern = r"^\d{{2}} +(?:{}) +\d{{4}}$".format("|".join(months))
match = re.search(pattern, text)
if match:
fdate = match.group(0)
print(fdate) # => 21 September 2010
Upvotes: 2