Reputation:
I'm sure this is in SO somewhere but I can't seem to find it. I'm trying to remove or select designated columns
in a pandas df
. But I want to keep certain values or strings
from those deleted columns
.
For the df
below I want to keep 'Big','Cat'
in Col B,C
but delete everything else.
import pandas as pd
d = ({
'A' : ['A','Keep','A','Value'],
'B' : ['Big','X','Big','Y'],
'C' : ['Cat','X','Cat','Y'],
})
df = pd.DataFrame(data=d)
If I do either the following it only selects that row.
Big = df[df['B'] == 'Big']
Cat = df[df['C'] == 'Cat']
My intended output is:
A B C
0 A Big Cat
1 Keep
2 A Big Cat
3 Value
I need something like x = df[df['B','C'] != 'Big','Cat']
Upvotes: 2
Views: 1007
Reputation: 285
Perhaps a concise version:
df.loc[df['B'] != 'Big', 'B'] = ''
df.loc[df['C'] != 'Cat', 'C'] = ''
print(df)
Output:
A B C
0 A Big Cat
1 Keep
2 A Big Cat
3 Value
Upvotes: 0
Reputation: 71610
Or this:
df[['B','C']]=df[['B','C']].apply(lambda row: row if row.tolist()==['Big','Cat'] else ['',''],axis=1)
print(df)
Output:
A B C
0 A Big Cat
1 Keep
2 A Big Cat
3 Value
Upvotes: 1
Reputation: 164843
You can filter on column combinations via NumPy and np.ndarray.all
:
mask = (df[['B', 'C']].values != ['Big', 'Cat']).all(1)
df.loc[mask, ['B', 'C']] = ''
print(df)
A B C
0 A Big Cat
1 Keep
2 A Big Cat
3 Value
Upvotes: 1
Reputation: 323396
IIUC
Update
df[['B','C']]=df[['B','C']][df[['B','C']].isin(['Big','Cat'])].fillna('')
df
Out[30]:
A B C
0 A Big Cat
1 Keep
2 A Big Cat
3 Value
Upvotes: 2
Reputation: 59304
Seems like you want to keep only some values and have empty string on ohters
Use np.where
keeps = ['Big', 'Cat']
df['B'] = np.where(df.B.isin(keeps), df.B, '')
df['C'] = np.where(df.C.isin(keeps), df.C, '')
A B C
0 A Big Cat
1 Keep
2 A Big Cat
3 Value
Another solution using df.where
cols = ['B', 'C']
df[cols] = df[cols].where(df.isin(keeps)).fillna('')
A B C
0 A Big Cat
1 Keep
2 A Big Cat
3 Value
Upvotes: 2