Reputation: 3412
I am trying to match dates (number in this case) from the following string:
mystring = '_20180701_20190630'
I am using the following code:
re.findall(r'(?:\A|_){1}([0-9]{4}[_]{0,1}[0-9]{2}[_]{0,1}[0-9]{2})(?:$|_){1}', mystring)
The pattern that I am looking for is overcomplicated for this very example because I need to take into account also other more complex situations.
Given that, I do not understand why the pattern above does not match the last number, while the following one does (the only difference is the end last characters: (?:$){1} vs (?:$|_){1} ):
re.findall(r'(?:\A|_){1}([0-9]{4}[_]{0,1}[0-9]{2}[_]{0,1}[0-9]{2})(?:$){1}', mystring)
Why does OR operator prevent the match? Is it because it is greedy and there is another number before?
Upvotes: 2
Views: 61
Reputation:
In the target sample _20180701_20190630
there is no match of the beginning of string
\A
. Why is it offered in the alternation (?:\A|_)
?
Can the number possibly have no preceding _
if at the beginning of string ?
Basically, if this is not a Multi-line operation, the regex should be this :
(?<![^_])(\d\d\d\d(?:_?\d\d){2})(?![^_])
https://regex101.com/r/HkGZEo/1
https://regex101.com/r/PkwEdK/1
https://regex101.com/r/VAREFJ/1
for boundary consistency, and drop the anchors entirely.
Expanded
(?<! [^_] ) # Look Behind, a _ or BOS
( # (1 start)
\d\d\d\d
(?: _? \d\d ){2}
) # (1 end)
(?! [^_] ) # Look Ahead, a _ or EOS
Upvotes: 0
Reputation: 785631
Your regex is actually matching and consuming trailing _
which is failing next match that must start with _
.
You may use lookahead to solve this:
(?:\A|_)(\d{4}_?\d{2}_?\d{2})(?=_|\Z)
By using a positive lookaead i.e. (?=_|\Z)
, we are only asserting presence of _
or \Z
but not really matching it.
I have also refactored your regex to simplify. {1}
can be removed and {0.1}
can be replaced with just ?
(optional match). Similarly [_]
can be just _
while [0-9]
can be shortened to \d
.
Upvotes: 2