Reputation: 153
I have this Dataframe:
C_M C_N C_Y
0 100 A A
1 200 C A,D
2 300 B A,C,E
3 400 A B,C,A
4 500 A A,D,E
How get I can I only keep the rows where the column C_N
and column C_Y
have "A"s?
This would be the desired outcome:
C_M C_N C_Y
0 100 A A
1 400 A B,C,A
2 500 A A,D,E
Upvotes: 1
Views: 57
Reputation: 152667
You could use boolean masking and then reset the index
# assuming your original DataFrame is named `df`
>>> df = df[(df.C_N == 'A') & (df.C_Y.str.contains('A'))].reset_index(drop=True)
>>> df
C_M C_N C_Y
0 100 A A
1 400 A B,C,A
2 500 A A,D,E
In case there could be values like BA,C,D
in C_Y
that you don't want to match, then you could use a regex as argument for str.contains
:
df[(df.C_N == 'A') & (df.C_Y.str.contains(r'(?:^A$)|(?:^A,)|(?:,A,)|(?:,+A$)'].reset_index(drop=True)
The regex checks if it only contains an A
(nothing else) or starts with an A,
or contains an ,A,
or ends with an ,A
Upvotes: 1
Reputation: 109546
To account for cases where column C_Y
can contain compound letters (e.g. 'BA'), this solution splits that columns and ensures that the value A
is contained as a unique element within the list:
>>> df.loc[(df['C_N'] == 'A') & (df['C_Y'].str.split(',').apply(lambda row: 'A' in row))]
C_M C_N C_Y
0 100 A A
3 400 A B,C,A
4 500 A A,D,E
Per the sample data, it assumes no spacing in column C_Y
.
Upvotes: 1
Reputation: 76927
Here's one way
In [1312]: df[(df.C_N == 'A') & (df.C_Y.str.contains('A'))]
Out[1312]:
C_M C_N C_Y
0 100 A A
3 400 A B,C,A
4 500 A A,D,E
another
In [1317]: df[df[['C_N', 'C_Y']].apply(lambda x: x.str.contains('A')).all(1)]
Out[1317]:
C_M C_N C_Y
0 100 A A
3 400 A B,C,A
4 500 A A,D,E
Upvotes: 0