user2869231
user2869231

Reputation: 1541

Regex to match a non numeric value or end of string in Python

I'm trying to match date formats with regex. An example date of each is:

02 Apr 15
02 Apr 2015

The regex I'm using to match the first one is:

re.compile("([0-9]{2}) ([A-Z][a-z]{2}) ([0-9]{2})")

And for the second:

re.compile("([0-9]{2}) ([A-Z][a-z]{2}) ([0-9]{4})")

Now the issue I'm having is that the second date will match the first regex, even though it contains 4 digits rather than just 2. I wanted to add an end of line to the regex, but sometimes there is the time appended to it (I.e. 4:32). So what I want to do is have the first regex match the corresponding date with the possibility of having nothing after it or a space+stuff after it. So the first one should match:

"02 Apr 15"
"02 Apr 15 5:23"

but not match:

"02 Apr 2015"
"02 Apr 2015 5:23"

It should be flopped for the other regex. So, pretty much, the only values that are important are the first 3 (dd Mmm YY and dd Mmm YYYY).

Upvotes: 1

Views: 624

Answers (1)

anubhava
anubhava

Reputation: 784998

What you're looking for is word boundary i.e.:

re.compile("\\b([0-9]{2}) ([A-Z][a-z]{2}) ([0-9]{2})\\b")

This will make sure 4 digit year is not matched while trying to matching first date in your examples.

However you should consider Python date parse routine

Upvotes: 1

Related Questions