Reputation: 168
I need a regex to match a string based on two conditions using Python:
,
,
Note: there is only one comma.
For example:
111,222
with n = 3 and m = 0 should return true because 1 is present 3 or more times before ,
and 0 times after ,
111,212
with n = 3 and m = 0 should return false because despite 1 is present 3 or more times before ,
it is present more than 0 times after ,
111,212
with n = 3 and m = 1 should return true because 1 is present 3 or more times before ,
and only 1 time after ,
I use (\d+)\1{n,}
to capture the digit and check the first condition. But I am having trouble with the second condition. I tried (\d+)\1{n,},\d*((?!\1)){0,m}\d*
but it is not working.
I assume that the \d
after the ,
in the regular expression matches the capturing group that should not appear, any idea?
Upvotes: 1
Views: 2925
Reputation: 22837
You're better to do this in code without regex by splitting on ,
and then counting the number of occurrences that one digit has in both parts. In python, it would be something like this:
See code in use here - change the values of n
and m
ss = ['111,222','111,212']
n,m = 3,1
for s in ss:
x,y = s.split(',')
for c in x:
if (x.count(c) >= n) and (y.count(c) <= m):
print(s)
break
In regex, it can be accomplished with something like the following but it's really not ideal:
(\d)(?:(?:(?!\1)\d)*\1){2}\d*,(?:(?!\1)\d)*(?:\1(?:(?!\1)\d)*){0,1}$
# ^ n-1 ^ m
Since you only care that it meets the minimum requirement of n
, we don't need to do {2,}
Upvotes: 3
Reputation: 163477
In this part of the pattern (\d+)\1{n,}
if n=3 you will repeat what you already have captured 3 times so it will try to match 4 digits instead of 3 digits.
I would suggest not using {0,m}
, but match an exact times like {1}
or {2}
etc, and after you have matched the backreference to group 1, assert that no more occurrences follow using the negative lookahead.
^(\d)\1{2,},(?=((?:\d*?\1){1}))\2(?!\d*\1)\d*
^
Start of string(\d)\1{2,},
Capture group 1, match a digit and repeat the backreference 2 or more times(?=
Positive lookahead
(
Capture group 2
(?:\d*?\1){1}
Repeat matching the backreference to group 1 m times. Here m = 1
)
Close group)
Close lookahead\2
Match what is captured in group 2 to prevent backtracking(?!\d*\1)
Negative lookahead, assert what follows is no more occurrences of group 1\d*
Match 0+ digitsFor example
import re
regex = r"^(\d)\1{2,},(?=((?:\d*?\1){1}))\2(?!\d*\1)\d*"
test_str = ("111,222\n"
"111,212")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print (match.group())
Output
111,212
Upvotes: 1