Reputation: 7113
Is is possible to write a regex where I can somehow refer the "length of the first capture group" later in the same regex? What I am trying to achieve here is to capture continuous occurrences of 1
's that are followed by the exact number of continuous occurrences of 2
's.
I want something like
r"(1*)(2{length(\1)})" # where `length(\1)` should give me the length of capture group 1
Should Match
1122 # two 1's followed by two 2's
111222 # three 1's followed by three 2's
121122111222 # should match `12` and `1122` and `111222` separately
Should Not Match
122 # there are two 2's following one 1
112 # there are two 1's but only one 2
11222 # same as above but with different occurrences
11122 # same as above but with different occurrences
Upvotes: 3
Views: 3241
Reputation:
Update I guess you could go with some absurd Java lookahead recursion simulation that won't work
or you could use Python to do it ?
>>> import regex
>>> rx_1_2 = r"(?m)^(1(?>(?1))*2)$"
>>>
>>> input = '''
... 111222222
... 11222234
... 1111222
... 111222
... 1122
... 12
... '''
>>> res = regex.findall( rx_1_2, input )
>>> print( res )
['111222', '1122', '12']
That this question was marked a duplicate of a Java simulated recursion
using lookaheads is astoundingly bad judgment on whoever covered this
question up by marking it a duplicate. Just plain poor judgment...
It can be done with pythons regex module.
Needs to use recursion.
Done this way because it is really just nested delimiters.
1
1
1
2
2
2
1(?>[^12]++|(?R))*2
https://regex101.com/r/4Nxtvl/1
# Recursion
1 # 1
(?> # Atomic group
[^12]++ # Possesive, not 1 or 2
| # or,
(?R) # Recurse the regex
)* # End cluster, do 0 to many times
2 # 2
To not allow inner content use 1(?>(?R))*2
https://regex101.com/r/mSUIp0/1
To add boundary conditions, contain the recursion to a group,
then surround it with boundary constructs.
(?<!\d)(1(?>[^12]++|(?1))*2)(?!\d)
https://regex101.com/r/SSr1zV/1
(?<! \d ) # Not a digit behind
( # (1 start), Recursion code group
1 # 1
(?> # Atomic group
[^12]++ # Possesive, not 1 or 2
| # or,
(?1) # Recurse the regex group 1
)* # End cluster, do 0 to many times
2 # 2
) # (1 end)
(?! \d ) # Not a digit ahead
To not allow inner content use (?<!\d)(1(?>(?1))*2)(?!\d)
https://regex101.com/r/VI6w0Y/1
Upvotes: 1
Reputation: 3301
If the largest number of sequential 1's is low enough then you can enumerate the options. something like:
r'(12)|(1122)|(1{3}2{3})' etc
You could even generate the regex. Long regexes can be surprisingly efficient if there isn't too much recursion.
for i in range(1:50):
regex += r"|1{" + str(i) + r"}2{" + str(i) + r"}"
You'll have to add boundaries as required as well.
If you don't mind making two passes, you can get the length from the re match object:
ONES = re.compile(r'1+')
match = ONES.search(string)
if match is not None:
length = match.end() - match.start()
TWOS = re.compile(r'2{' + str(length) + r'}')
string = string[match.end():]
match = TWOS.search(string)
...
If you're willing to not use regex, consider splitting into a list and using list operations
Upvotes: 0