Clyde Barrow
Clyde Barrow

Reputation: 2102

Python regex - any substring matches

I want to find dates in the formats 18-05-2018 and 18-05-18, but not 2018-05-18. I want to use regular expressions such that I get True when such a date appears in a string.

So it should return True for these strings:

But it should return False for these strings:

How to do it? I've found findall() method and pattern '\d{1,2}[-]\d{1,2}[-]\d{2,4}' but it returned True for the last two strings, as it found 18-05-18 in them.

Upvotes: 2

Views: 1428

Answers (4)

The fourth bird
The fourth bird

Reputation: 163207

You could use a negative lookbehind and a negative lookahead to assert that there are no digits on the left and on the right side. To match either 2 or 4 digits at the end you could use an alternation:

(?<!\d)\d{2}-\d{2}-(?:\d{4}|\d{2})(?!\d)

Regex demo

import re
str = 'ggggg18-05-2018ggggg12345678'
print(re.findall(r'(?<!\d)\d{2}-\d{2}-(?:\d{4}|\d{2})(?!\d)', str))

Note that you can use the hyphen without the character class.

Demo Python

Upvotes: 0

Austin
Austin

Reputation: 26039

Use negative lookbehind and lookahead:

import re

s = 'sasdassdsadasdadas18-05-2018sdaq1213211214142'

print(re.findall(r'(?<!\d)\d{1,2}[-]\d{1,2}[-]\d{2,4}(?!\d)', s))
# ['18-05-2018']

This makes sure that there is no trailing digits at the beginning or at the end of what is desired.


To prove that it handles your error case:

import re

s = 'sasdassdsadasdadas2018-05-2018sdaq1213211214142'

print(re.findall(r'(?<!\d)\d{1,2}[-]\d{1,2}[-]\d{2,4}(?!\d)', s))
# []

Upvotes: 3

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520888

One approach is to check that what comes before the start of the date match is either a non number of the start of the input, and that what comes after the date match is also a non digit or the end of the input.

text = "sasdassdsadasdadas18-05-2018sdaq1213211214142"
matches = re.findall(r'(?:\D|^)(\d{1,2}[-]\d{1,2}[-]\d{2,4})(?:\D|$)', text)
print matches

['18-05-2018']

Upvotes: 1

David Z
David Z

Reputation: 131550

I'd suggest using a negative lookbehind (?<!...), which you can insert at any point in a regular expression to ensure that whatever comes right before that point does not match a certain expression (the ...). In your case, you want to ensure that what comes right before the beginning of the expression doesn't match a digit (\d), so you would insert (?<!\d) at the beginning of your regex.

If you would also like to exclude matches with the wrong number of digits at the end, as in aaaa18-05-181bbb, then you could also use a negative lookahead (?!...), which is similar to the negative lookbehind except that it ensures whatever comes after a certain point does not match an expression. In your case, to ensure that a digit does not come after the end of the match, you'd add (?!\d) at the end of your expression.

Upvotes: 0

Related Questions