Reputation:
I would need to find the common users which have
__
in CODE;The dataset looks like:
User_ID CODE
A12 AAada __fa
F453 21 ads
J43 Has AA
... ...
H21 MNasdf
L32 sad 21
M54 43__12 asd
... ...
What I should do is:
__
in CODE, i.e. A12 and M54;I have tried to filtering the users using regex with ^[^0-9]*$
in case of numbers (but also df.CODE.str.contains('^\d+\'
)would be fine) and /[$-/:-?{-~!"^_[]]/
in case of __
.
Upvotes: 1
Views: 76
Reputation: 5183
You can use the string accessor for series series.str.contains()
. Here is the user guide
And the code for your solution
pats = ['AA', '__', '\d']
mask = {}
for pat in pats:
mask[pat] = df.CODE.str.contains(pat, regex=True)
# regex=True is default, shown here for demonstration
print()
print(mask[pat])
Output
0 True
1 False
2 True
3 False
4 False
5 False
Name: CODE, dtype: bool
0 True
1 False
2 False
3 False
4 False
5 True
Name: CODE, dtype: bool
0 False
1 True
2 False
3 False
4 True
5 True
Name: CODE, dtype: bool
You can use each of these masks to filter the dataframe later on. In this case it is good to keep them as separate masks, as they seem to have overlaps.
Upvotes: 2
Reputation: 16683
You can use |
(or) with str.contains()
with |
separating the three patterns:
df = df[df['CODE'].str.contains('\d|__|AA')]
Out[3]:
User_ID CODE
0 A12 AAada __fa
1 F453 21 ads
2 J43 Has AA
5 L32 sad 21
6 M54 43__12 asd
Upvotes: 4