Reputation: 2102
I want to find dates in the formats 18-05-2018
and 18-05-18
, but not 2018-05-18
. I want to use regular expressions such that I get True
when such a date appears in a string.
So it should return True
for these strings:
ggggg18-05-2018ggggg
ggggg18-05-2018ggggg12345678
ggggg18-05-18ggggg
ggggg18-05-18ggggg12345678
But it should return False
for these strings:
ggggg2018-05-18ggggg
ggggg2018-05-18ggggg12345678
How to do it? I've found findall()
method and pattern '\d{1,2}[-]\d{1,2}[-]\d{2,4}'
but it returned True
for the last two strings, as it found 18-05-18
in them.
Upvotes: 2
Views: 1428
Reputation: 163207
You could use a negative lookbehind and a negative lookahead to assert that there are no digits on the left and on the right side. To match either 2 or 4 digits at the end you could use an alternation:
(?<!\d)\d{2}-\d{2}-(?:\d{4}|\d{2})(?!\d)
import re
str = 'ggggg18-05-2018ggggg12345678'
print(re.findall(r'(?<!\d)\d{2}-\d{2}-(?:\d{4}|\d{2})(?!\d)', str))
Note that you can use the hyphen without the character class.
Upvotes: 0
Reputation: 26039
Use negative lookbehind and lookahead:
import re
s = 'sasdassdsadasdadas18-05-2018sdaq1213211214142'
print(re.findall(r'(?<!\d)\d{1,2}[-]\d{1,2}[-]\d{2,4}(?!\d)', s))
# ['18-05-2018']
This makes sure that there is no trailing digits at the beginning or at the end of what is desired.
To prove that it handles your error case:
import re
s = 'sasdassdsadasdadas2018-05-2018sdaq1213211214142'
print(re.findall(r'(?<!\d)\d{1,2}[-]\d{1,2}[-]\d{2,4}(?!\d)', s))
# []
Upvotes: 3
Reputation: 520888
One approach is to check that what comes before the start of the date match is either a non number of the start of the input, and that what comes after the date match is also a non digit or the end of the input.
text = "sasdassdsadasdadas18-05-2018sdaq1213211214142"
matches = re.findall(r'(?:\D|^)(\d{1,2}[-]\d{1,2}[-]\d{2,4})(?:\D|$)', text)
print matches
['18-05-2018']
Upvotes: 1
Reputation: 131550
I'd suggest using a negative lookbehind (?<!...)
, which you can insert at any point in a regular expression to ensure that whatever comes right before that point does not match a certain expression (the ...
). In your case, you want to ensure that what comes right before the beginning of the expression doesn't match a digit (\d
), so you would insert (?<!\d)
at the beginning of your regex.
If you would also like to exclude matches with the wrong number of digits at the end, as in aaaa18-05-181bbb
, then you could also use a negative lookahead (?!...)
, which is similar to the negative lookbehind except that it ensures whatever comes after a certain point does not match an expression. In your case, to ensure that a digit does not come after the end of the match, you'd add (?!\d)
at the end of your expression.
Upvotes: 0