Reputation: 71
I have a Python class with methods inside. One of them is public.
How to apply this method in filtering rows data in Pandas?
I meaan something like this:
class Math:
def isTrue(value):
return True
Pandas rule:
df[df["name"].apply(Math.isTrue)]
If values in column name are True
show them in result dataframe.
Data is:
Number Name Country
1 Vasila US
1212 oLGA AU
6 Fors RE
I need to filter all rows where Number has double pair like 1212
using my custom method from class with regex.
Result should be:
Number Name Country
1212 oLGA AU
Upvotes: 1
Views: 62
Reputation: 1478
import re
class Matcher:
pattern = re.compile('^(?P<num_pair>\d\d)(?(num_pair)(?P=num_pair))$')
@classmethod
def has_num_pair(cls, n: int) -> bool:
if cls.pattern.match(str(n)) is None:
return False
return True
df[df['Number'].apply(Matcher.has_num_pair)]
Regex explanation:
pattern = re.compile(
'^' # at the beginning of the string
'(?P<num_pair>' # create a capturing group named "num_pair" ...
'\d\d)' # ... that captures two digits
'(?(num_pair)' # if the group "num_pair" captures something ...
'(?P=num_pair))' # try to match the captured content again
'$' # the string must end after that
)
This pattern will match numbers that are made of a repeated pair of digits, like 1212
, 9898
or 3535
, but it will not match numbers that include such a pair along with other digits, like 14343
for example. If you want to match those too, change your regex as such:
pattern = re.compile('.*(?P<num_pair>\d\d)(?(num_pair)(?P=num_pair)).*')
This variant will also match 14343
, 767689
and so on.
Upvotes: 1