Reputation: 423
I have some % cases as the follow -
12.02
16.59
81.61%
45
24.812
51.35
19348952
88.22
0
000
021
.85%
100
I want to match all the percentages type except anything larger than 100. Expected Output:
12.02
16.59
81.61
45
24.812
51.35
88.22
0
000
21
.85
100
I have tried (Regular Expression for Percentage of marks). But this one fails to get all the cases that I want. Also, I am replacing the non-match with empty string. So my code in python looks like like this -
pattern=r'(\b(?<!\.)(?!0+(?:\.0+)?%)(?:\d|[1-9]\d|100)(?:(?<!100)\.\d+)?$)'
df['Percent']=df['Percent'].astype(str).str.extract(pattern)[0]
Many thanks.
Edit: The solution (by @rv.kvetch) matches most of the edge cases except the 0 ones but I can work with that limitation. The original post had requirement of not matching 0 case or 0%.
Upvotes: 0
Views: 864
Reputation: 11621
I'm probably very close but looks like this is working for me so far:
^(?:0{0,})((?:[1-9]{1,2}|100)?(?:\.\d+)?)%?$
(?:0{0,})
- non-capturing group which matches a leading 0, that appears
zero or more times.(?:[1-9]{1,2}|100)?
- Optional, non-capturing group which matches the digits 1-9
one to two times, to essentially cover the range 1-99. Then an or condition so we also cover 100. This group is made optional by ?
to cover cases like .24, which is still a valid percentage.(?:\.\d+)?
- Optional, non-capturing group which matches the fractional part, e.g. .123
. This is optional because numbers like 20 are valid percentage values by themselves.%?
- finally, here we match the optional trailing percent (%) symbol that can come at the end.Here is a non-regex approach that should be more efficient than a regex approach. This also covers edge cases like .0
that the regex currently hasn't been updated to handle:
string = """
12.02
16.59
81.61%
45
24.812
51.35
19348952
88.22
0
000
.0%
021
.85%
100
150
1.2.3
hello world
"""
for n in string.split('\n'):
try:
x = float(n.rstrip('%'))
except ValueError: # invalid numeric value
continue
# Check if number is in the range 0..100 (non-inclusive of 0)
if 0 < x <= 100:
print(x)
Upvotes: 0
Reputation: 1265
If you want, you can do it without using regex.
nums = ['12.02'
'16.59',
'81.61%',
'45',
'24.812',
'51.35',
'19348952',
'88.22',
'0',
'000',
'021',
'.85%',
'100']
for n in nums:
x = n.sptrip('%')
x = int(x)
if x <= 100:
print(n)
Upvotes: 1