Ryan
Ryan

Reputation: 10099

trying to match re

Im trying to print this using regular expression

trying = 'Mar 20th, 2009'

I cant get it to print the comma after the 20th, here is what i have tried,

print (re.findall(r'(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[\s]\d{2}[th , ]+', trying))
print (re.findall(r'(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[\s]\d{2}[a-z,]+', trying))
print (re.findall(r'(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[\s]\d{2}[a-z]+[,]', trying))

The desired output should be the input string. what am i doing wrong?

Upvotes: 0

Views: 91

Answers (2)

Sandeep Lade
Sandeep Lade

Reputation: 1943

This will work

>>> print (re.findall(r'(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[\s]\d{1,2}th[,][\s]\d{4}',trying))
=> ['Mar 20th, 2009']`

And now lets see why your trials didn't give you expected result

  1. print (re.findall(r'(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[\s]\d{2}[th , ]+', trying)) -> This has space after th so it will not match

  2. print (re.findall(r'(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[\s]\d{2}[a-z,]+', trying)) -> by giving + , you search ends by finding one or more th, so it matches only till th,

  3. print (re.findall(r'(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[\s]\d{2}[a-z]+[,]', trying)) -> similarly your searching for substring ends with , so macthes till th,

Upvotes: 3

VISWESWARAN NAGASIVAM
VISWESWARAN NAGASIVAM

Reputation: 352

Try this regular expression

r'(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?:[0-9]{2}|[0-9])[rdth]{2}, \d{4}'

which will match this,

>>> x = re.findall(r'(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?:[0-9]{2}|[0-9])[rdth]{2}, \d{4}', trying)
>>> x
['Mar 20th, 2009']
>>> tryig = 'Jun 3rd, 2017'
>>> x = re.findall(r'(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (?:[0-9]{2}|[0-9])[rdth]{2}, \d{4}', tryig)
>>> x
['Jun 3rd, 2017']

Update based on the comment:

>>> regex = r'(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{1,2}[rdth]{2}, \d{4}'
>>> x = re.findall(regex, trying)
>>> x
['Mar 20th, 2009']

Upvotes: 2

Related Questions