Reputation: 1541
I'm trying to match date formats with regex. An example date of each is:
02 Apr 15
02 Apr 2015
The regex I'm using to match the first one is:
re.compile("([0-9]{2}) ([A-Z][a-z]{2}) ([0-9]{2})")
And for the second:
re.compile("([0-9]{2}) ([A-Z][a-z]{2}) ([0-9]{4})")
Now the issue I'm having is that the second date will match the first regex, even though it contains 4 digits rather than just 2. I wanted to add an end of line to the regex, but sometimes there is the time appended to it (I.e. 4:32). So what I want to do is have the first regex match the corresponding date with the possibility of having nothing after it or a space+stuff after it. So the first one should match:
"02 Apr 15"
"02 Apr 15 5:23"
but not match:
"02 Apr 2015"
"02 Apr 2015 5:23"
It should be flopped for the other regex. So, pretty much, the only values that are important are the first 3 (dd Mmm YY and dd Mmm YYYY).
Upvotes: 1
Views: 624
Reputation: 784998
What you're looking for is word boundary i.e.:
re.compile("\\b([0-9]{2}) ([A-Z][a-z]{2}) ([0-9]{2})\\b")
This will make sure 4 digit year is not matched while trying to matching first date in your examples.
However you should consider Python date parse routine
Upvotes: 1