Reputation: 19

Python regex that repeats \d number of times

Using python regex, I am trying to match as many number of p as the the digit first matched in pattern.

Sample Input

1pp
2p
3ppp
4ppppppppp

Expected Output

1p
None
3ppp
4pppp

Code Tried I have tried the following code, where i use named group, and give the name 'dig' to the matched digit, now I want to use dig in repetition {m}. But the following code does not find any match in pattern.

pattern = "2pppp"
reTriple = '((?P<dig>\d)p{(?P=dig)})'
regex = re.compile(reTriple,re.IGNORECASE)
matches = re.finditer(regex,pattern)

I think the problem is that repetition {m} expects an int m, where as dig is a string. But I can't find a way to concatenate an int to string while keeping it int! I tried casting as follows:

reTrip = '((?P<dig>\d)p{%d}'%int('(?P=dig)')+')'

But I get the following error:

ValueError: invalid literal for int() with base 10: '(?P=dig)'

I feel stuck. Can someone please guide.

And its weird that if i instead break reTriple as follows: save the matched digit in a variable first and then concatenate this variable in reTriple, it works, and the expected output is achieved. But this is a work around, and I am looking for a better method.

reTriple = '(?P<dig>\d)'
dig = re.search(reTriple , pattern).group('dig')
reTriple = reTriple + '(p{1,' + dig + '})'

Upvotes: 1

Answers (4)

Nick

Reputation: 147146

Here's a single step regex solution which uses a lambda function to check if there are sufficient p's to match the digits at the beginning of the string; if there are it returns the appropriate string (e.g. 1p or 3ppp), otherwise it returns an empty string:

import re

strs = ['1pp',
        '2p',
        '3ppp',
        '4ppppppppp'
        ]

for s in strs:
    print(re.sub(r'^(\d+)(p+).*', lambda m: m.group(1) + m.group(2)[:int(m.group(1))] if len(m.group(2)) >= int(m.group(1)) else '', s))

Output:

1p

3ppp
4pppp

Upvotes: 0

Beny Gj

Reputation: 615

Hi you can do another approach something like this without regex:

from typing import Union
def  test(txt: str, var: str ='p') -> Union[str, None]: 
    var_count = txt.count(var)
    number = int(txt[0:len(txt) - var_count:]) 
    if number <= var_count: 
        return f'{number}{number * var}' 

    return None

lets test it output:

t = ['1pp', '2p', '3ppp', '4ppppppppp', '10pppppppppp']             

for i in t: 
    print(test(i)) 

1p
None
3ppp
4pppp
10pppppppppp

Upvotes: 0

Austin

Reputation: 26039

You can also do pure string operations without depending on any module for the mentioned strings in the question (digits < 10):

def val_txt(txt):
    dig = int(txt[0])
    rest_val = 'p' * dig
    return f'{dig}{rest_val}' if txt[1:1+dig] == rest_val else None

print(val_txt('1ppp'))
# 1p

Upvotes: 1

JvdV

Reputation: 75840

It seems that what you are trying basically comes down to: (\d+)p{\1} where you would use capture group 1 as input for how often you need to match "p". However capture group one seems to be returned as text (not numeric) causing you to find no results. Have a look here for example.

Maybe it helps to split this into two operations. For example:

import re

def val_txt(txt):
    i = int(re.search(r'\d+', txt).group(0))
    fnd = re.compile(fr'(?i)\d+p{{{i}}}')
    if fnd.search(txt):
        return fnd.search(txt).group(0)

print(val_txt('2p'))

Upvotes: 1

Python regex that repeats \d number of times

Answers (4)

Related Questions