Dexter
Dexter

Reputation: 168

regular expression to match a digit present n or more times before character and n or less times after it

I need a regex to match a string based on two conditions using Python:

  1. a digit is present at least n times before a ,
  2. The digit matched from condition 1 is present at most m times after a ,

Note: there is only one comma.

For example:

111,222 with n = 3 and m = 0 should return true because 1 is present 3 or more times before , and 0 times after ,

111,212 with n = 3 and m = 0 should return false because despite 1 is present 3 or more times before , it is present more than 0 times after ,

111,212 with n = 3 and m = 1 should return true because 1 is present 3 or more times before , and only 1 time after ,

I use (\d+)\1{n,} to capture the digit and check the first condition. But I am having trouble with the second condition. I tried (\d+)\1{n,},\d*((?!\1)){0,m}\d* but it is not working.

I assume that the \d after the , in the regular expression matches the capturing group that should not appear, any idea?

Upvotes: 1

Views: 2925

Answers (2)

ctwheels
ctwheels

Reputation: 22837

Code

You're better to do this in code without regex by splitting on , and then counting the number of occurrences that one digit has in both parts. In python, it would be something like this:

See code in use here - change the values of n and m

ss = ['111,222','111,212']
n,m = 3,1
for s in ss:
    x,y = s.split(',')
    for c in x:
        if (x.count(c) >= n) and (y.count(c) <= m):
            print(s)
            break

Regex

In regex, it can be accomplished with something like the following but it's really not ideal:

See regex in use here

(\d)(?:(?:(?!\1)\d)*\1){2}\d*,(?:(?!\1)\d)*(?:\1(?:(?!\1)\d)*){0,1}$
#                       ^ n-1                                    ^ m

Since you only care that it meets the minimum requirement of n, we don't need to do {2,}

Upvotes: 3

The fourth bird
The fourth bird

Reputation: 163477

In this part of the pattern (\d+)\1{n,} if n=3 you will repeat what you already have captured 3 times so it will try to match 4 digits instead of 3 digits.

I would suggest not using {0,m}, but match an exact times like {1} or {2} etc, and after you have matched the backreference to group 1, assert that no more occurrences follow using the negative lookahead.

^(\d)\1{2,},(?=((?:\d*?\1){1}))\2(?!\d*\1)\d*
  • ^ Start of string
  • (\d)\1{2,}, Capture group 1, match a digit and repeat the backreference 2 or more times
  • (?= Positive lookahead
    • ( Capture group 2
      • (?:\d*?\1){1} Repeat matching the backreference to group 1 m times. Here m = 1
    • ) Close group
  • ) Close lookahead
  • \2 Match what is captured in group 2 to prevent backtracking
  • (?!\d*\1) Negative lookahead, assert what follows is no more occurrences of group 1
  • \d* Match 0+ digits

Regex demo | Python demo

For example

import re

regex = r"^(\d)\1{2,},(?=((?:\d*?\1){1}))\2(?!\d*\1)\d*"
test_str = ("111,222\n"
    "111,212")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):
    print (match.group())

Output

111,212

Upvotes: 1

Related Questions