xingyuan jiang
xingyuan jiang

Reputation: 153

How to drop all rows that don't contain A in this DataFrame?

I have this Dataframe:

           C_M       C_N     C_Y
 0         100       A        A
 1         200       C        A,D                 
 2         300       B        A,C,E
 3         400       A        B,C,A
 4         500       A        A,D,E

How get I can I only keep the rows where the column C_N and column C_Y have "A"s?

This would be the desired outcome:

           C_M       C_N     C_Y
 0         100       A        A
 1         400       A        B,C,A
 2         500       A        A,D,E

Upvotes: 1

Views: 57

Answers (3)

MSeifert
MSeifert

Reputation: 152667

You could use boolean masking and then reset the index

# assuming your original DataFrame is named `df`
>>> df = df[(df.C_N == 'A') & (df.C_Y.str.contains('A'))].reset_index(drop=True)
>>> df
   C_M C_N    C_Y
0  100   A      A
1  400   A  B,C,A
2  500   A  A,D,E

In case there could be values like BA,C,D in C_Y that you don't want to match, then you could use a regex as argument for str.contains:

df[(df.C_N == 'A') & (df.C_Y.str.contains(r'(?:^A$)|(?:^A,)|(?:,A,)|(?:,+A$)'].reset_index(drop=True)

The regex checks if it only contains an A (nothing else) or starts with an A, or contains an ,A, or ends with an ,A

Upvotes: 1

Alexander
Alexander

Reputation: 109546

To account for cases where column C_Y can contain compound letters (e.g. 'BA'), this solution splits that columns and ensures that the value A is contained as a unique element within the list:

>>> df.loc[(df['C_N'] == 'A') & (df['C_Y'].str.split(',').apply(lambda row: 'A' in row))]
   C_M C_N    C_Y
0  100   A      A
3  400   A  B,C,A
4  500   A  A,D,E

Per the sample data, it assumes no spacing in column C_Y.

Upvotes: 1

Zero
Zero

Reputation: 76927

Here's one way

In [1312]: df[(df.C_N == 'A') & (df.C_Y.str.contains('A'))]
Out[1312]:
   C_M C_N    C_Y
0  100   A      A
3  400   A  B,C,A
4  500   A  A,D,E

another

In [1317]: df[df[['C_N', 'C_Y']].apply(lambda x: x.str.contains('A')).all(1)]
Out[1317]:
   C_M C_N    C_Y
0  100   A      A
3  400   A  B,C,A
4  500   A  A,D,E

Upvotes: 0

Related Questions