Reputation: 43169
This is a follow-up of this question (not asked by me though). Trying to answer, I ran into a couple of problems.
Consider the string strings123[abc789<span>123</span>def<span>456</span>000]strings456
, how would one match the digits in square brackets that are not surrounded by span
tags in Python
(using the newer regex
module) ?
In the example string, this would be 789
and 000
.
\G
like (demo)
(?:\G(?!\A)|\[)
[^\d\]]*
\K
\d+
and (*SKIP)(*FAIL)
(demo):
<span>.*?</span>(*SKIP)(*FAIL)
|
\d+
But was unable to combine both statements:
<span>.*?</span>(*SKIP)(*FAIL)
|
(?:
(?:\G(?!\A)|\[)
[^\d\]]*
(\d+)
[^\d\]]*
\K
)
How can this be done?
Upvotes: 3
Views: 75
Reputation: 627087
One of the things I like about PyPi regex module is that it supports infinite-width lookbehind:
- Variable-length lookbehind
A lookbehind can match a variable-length string.
>>> import regex
>>> s = 'strings123[abc789<span>123</span>def<span>456</span>000]strings456'
>>> rx = r'(?<=\[[^][]*)(?:<span>[^<]*</span>(*SKIP)(?!)|\d+)(?=[^][]*])'
>>> regex.findall(rx, s)
['789', '000']
>>>
Pattern details:
(?<=\[[^][]*)
- there must be a [
followed with zero or more chars other than ]
and [
immediately to the left of the current location(?:
- a non-capturing group start
<span>[^<]*</span>(*SKIP)(?!)
- match a <span>
, then 0+ chars other than <
(with a [^<]*
negated character class), and then a </span>
and discard the match while staying at the match end position, and go on to look for the next match|
- or\d+
- 1+ digits(?=[^][]*])
- there must be a ]
after zero or more chars other than ]
and [
immediately to the right of the current location.Upvotes: 3
Reputation: 2748
I thought of an algorithm which is as follows.
Search for square brackets and contents within it and store result in a variable. Regex would be \[[^]]*\]
.
Now search for <span>
tags and replace it with -
just for simplicity of next step. Regex would be (<span>.*?</span>)
.
Now you will be left with contents of square brackets other than what was in <span>
tags. Simply search with \d+
to match digits.
Upvotes: 1