user12809368
user12809368

Reputation:

Filtering rows in pandas

I would need to find the common users which have

The dataset looks like:

User_ID       CODE

A12          AAada __fa
F453         21 ads
J43          Has AA 
...          ...
H21          MNasdf
L32          sad 21
M54          43__12 asd
...          ...

What I should do is:

I have tried to filtering the users using regex with ^[^0-9]*$ in case of numbers (but also df.CODE.str.contains('^\d+\')would be fine) and /[$-/:-?{-~!"^_[]]/ in case of __.

Upvotes: 1

Views: 76

Answers (2)

RichieV
RichieV

Reputation: 5183

You can use the string accessor for series series.str.contains(). Here is the user guide

And the code for your solution

pats = ['AA', '__', '\d']
mask = {}
for pat in pats:
    mask[pat] = df.CODE.str.contains(pat, regex=True)
        # regex=True is default, shown here for demonstration
    
    print()
    print(mask[pat])

Output

0     True
1    False
2     True
3    False
4    False
5    False
Name: CODE, dtype: bool

0     True
1    False
2    False
3    False
4    False
5     True
Name: CODE, dtype: bool

0    False
1     True
2    False
3    False
4     True
5     True
Name: CODE, dtype: bool

You can use each of these masks to filter the dataframe later on. In this case it is good to keep them as separate masks, as they seem to have overlaps.

Upvotes: 2

David Erickson
David Erickson

Reputation: 16683

You can use | (or) with str.contains() with | separating the three patterns:

df = df[df['CODE'].str.contains('\d|__|AA')]

Out[3]: 
  User_ID        CODE
0     A12  AAada __fa
1    F453      21 ads
2     J43      Has AA
5     L32      sad 21
6     M54  43__12 asd

Upvotes: 4

Related Questions