Reputation: 19
Using python regex, I am trying to match as many number of p
as the the digit first matched in pattern.
Sample Input
1pp
2p
3ppp
4ppppppppp
Expected Output
1p
None
3ppp
4pppp
Code Tried
I have tried the following code, where i use named group, and give the name 'dig' to the matched digit, now I want to use dig
in repetition {m}
. But the following code does not find any match in pattern
.
pattern = "2pppp"
reTriple = '((?P<dig>\d)p{(?P=dig)})'
regex = re.compile(reTriple,re.IGNORECASE)
matches = re.finditer(regex,pattern)
I think the problem is that repetition {m}
expects an int m
, where as dig
is a string. But I can't find a way to concatenate an int to string while keeping it int! I tried casting as follows:
reTrip = '((?P<dig>\d)p{%d}'%int('(?P=dig)')+')'
But I get the following error:
ValueError: invalid literal for int() with base 10: '(?P=dig)'
I feel stuck. Can someone please guide.
And its weird that if i instead break reTriple as follows: save the matched digit in a variable first and then concatenate this variable in reTriple, it works, and the expected output is achieved. But this is a work around, and I am looking for a better method.
reTriple = '(?P<dig>\d)'
dig = re.search(reTriple , pattern).group('dig')
reTriple = reTriple + '(p{1,' + dig + '})'
Upvotes: 1
Views: 468
Reputation: 147146
Here's a single step regex solution which uses a lambda function to check if there are sufficient p
's to match the digits at the beginning of the string; if there are it returns the appropriate string (e.g. 1p
or 3ppp
), otherwise it returns an empty string:
import re
strs = ['1pp',
'2p',
'3ppp',
'4ppppppppp'
]
for s in strs:
print(re.sub(r'^(\d+)(p+).*', lambda m: m.group(1) + m.group(2)[:int(m.group(1))] if len(m.group(2)) >= int(m.group(1)) else '', s))
Output:
1p
3ppp
4pppp
Upvotes: 0
Reputation: 615
Hi you can do another approach something like this without regex
:
from typing import Union
def test(txt: str, var: str ='p') -> Union[str, None]:
var_count = txt.count(var)
number = int(txt[0:len(txt) - var_count:])
if number <= var_count:
return f'{number}{number * var}'
return None
lets test it output:
t = ['1pp', '2p', '3ppp', '4ppppppppp', '10pppppppppp']
for i in t:
print(test(i))
1p
None
3ppp
4pppp
10pppppppppp
Upvotes: 0
Reputation: 26039
You can also do pure string operations without depending on any module for the mentioned strings in the question (digits < 10):
def val_txt(txt):
dig = int(txt[0])
rest_val = 'p' * dig
return f'{dig}{rest_val}' if txt[1:1+dig] == rest_val else None
print(val_txt('1ppp'))
# 1p
Upvotes: 1
Reputation: 75840
It seems that what you are trying basically comes down to: (\d+)p{\1}
where you would use capture group 1 as input for how often you need to match "p". However capture group one seems to be returned as text (not numeric) causing you to find no results. Have a look here for example.
Maybe it helps to split this into two operations. For example:
import re
def val_txt(txt):
i = int(re.search(r'\d+', txt).group(0))
fnd = re.compile(fr'(?i)\d+p{{{i}}}')
if fnd.search(txt):
return fnd.search(txt).group(0)
print(val_txt('2p'))
Upvotes: 1