Reputation: 1314
I have a database like this:
df = pd.DataFrame( {
"A" : [0,0,1,1,0,1] ,
"B" : [1,0,0,1,1,0],
"C" : [0,0,0,1,1,0],
"D" : [1,1,0,0,0,1]} )
which looks like this:
A B C D
0 0 1 0 1
1 0 0 0 1
2 1 0 0 0
3 1 1 1 0
4 0 1 1 0
5 1 0 0 1
I have a list of columns I wish to keep allowed_columns = ["A","B"]
. This means we get rid of C
and D
. However, when dropping the columns, if there is a value 1, I want to note that in a new column lost
. This is what I'm trying to achieve:
A B lost
0 0 1 1
1 0 0 1
2 1 0 0
3 1 1 1
4 0 1 1
5 1 0 1
For ease of problem, we can assume that C
and D
cannot have value 1 simultaneously. How can I achieve this?
Upvotes: 2
Views: 156
Reputation: 989
df['lost']=((df['C']==1)|(df['D']==1)).astype(int)
df.drop(['C','D'],axis=1,inplace=True)
You can use two booleans separated by OR
to define the values in df['lost']
! I think it is also intuitive, because
(df['C']==1)|(df['D']==1)
will be True
if you have 1
in either
column C or column D; otherwise it will be False
astype(int)
converts True
to 1
and False
to 0
Upvotes: 0
Reputation: 88276
You could use any
:
c = df.columns.difference(allowed_columns)
df['lost'] = df[c].any(axis=1).view('i1')
print(df)
A B C D lost
0 0 1 0 1 1
1 0 0 0 1 1
2 1 0 0 0 0
3 1 1 1 0 1
4 0 1 1 0 1
5 1 0 0 1 1
Upvotes: 1
Reputation: 294488
groupby
d = dict.fromkeys({*df} - {*allowed_columns}, 'lost')
df.groupby(lambda x: d.get(x, x), axis=1).max()
A B lost
0 0 1 1
1 0 0 1
2 1 0 0
3 1 1 1
4 0 1 1
5 1 0 1
Upvotes: 1
Reputation: 59579
Subset to the allowed columns, then take the max of everything you removed with df.columns.difference
df = (df[allowed_columns]
.assign(lost=df[df.columns.difference(allowed_columns)].max(1)))
Upvotes: 3
Reputation: 323326
Let us do
df['Lost']=df[['C','D']].max(1)
df=df.drop(['C','D'],axis=1)
Upvotes: 2