Vishakha Lall
Vishakha Lall

Reputation: 1314

Check if column is in a list, remove if not and add value to a new column

I have a database like this:

df = pd.DataFrame( { 
        "A" : [0,0,1,1,0,1] , 
        "B" : [1,0,0,1,1,0],
        "C" : [0,0,0,1,1,0],
        "D" : [1,1,0,0,0,1]} )

which looks like this:

    A   B   C   D
0   0   1   0   1
1   0   0   0   1
2   1   0   0   0
3   1   1   1   0
4   0   1   1   0
5   1   0   0   1

I have a list of columns I wish to keep allowed_columns = ["A","B"]. This means we get rid of C and D. However, when dropping the columns, if there is a value 1, I want to note that in a new column lost. This is what I'm trying to achieve:

    A   B   lost    
0   0   1   1   
1   0   0   1   
2   1   0   0   
3   1   1   1   
4   0   1   1   
5   1   0   1   

For ease of problem, we can assume that C and D cannot have value 1 simultaneously. How can I achieve this?

Upvotes: 2

Views: 156

Answers (5)

tianlinhe
tianlinhe

Reputation: 989

df['lost']=((df['C']==1)|(df['D']==1)).astype(int)
df.drop(['C','D'],axis=1,inplace=True)

You can use two booleans separated by OR to define the values in df['lost']! I think it is also intuitive, because

  • (df['C']==1)|(df['D']==1) will be True if you have 1 in either column C or column D; otherwise it will be False

  • astype(int) converts True to 1 and False to 0

Upvotes: 0

yatu
yatu

Reputation: 88276

You could use any:

c = df.columns.difference(allowed_columns)
df['lost'] = df[c].any(axis=1).view('i1')

print(df)

   A  B  C  D  lost
0  0  1  0  1     1
1  0  0  0  1     1
2  1  0  0  0     0
3  1  1  1  0     1
4  0  1  1  0     1
5  1  0  0  1     1

Upvotes: 1

piRSquared
piRSquared

Reputation: 294488

groupby

d = dict.fromkeys({*df} - {*allowed_columns}, 'lost')
df.groupby(lambda x: d.get(x, x), axis=1).max()

   A  B  lost
0  0  1     1
1  0  0     1
2  1  0     0
3  1  1     1
4  0  1     1
5  1  0     1

Upvotes: 1

ALollz
ALollz

Reputation: 59579

Subset to the allowed columns, then take the max of everything you removed with df.columns.difference

df = (df[allowed_columns]
       .assign(lost=df[df.columns.difference(allowed_columns)].max(1)))

Upvotes: 3

BENY
BENY

Reputation: 323326

Let us do

df['Lost']=df[['C','D']].max(1)
df=df.drop(['C','D'],axis=1)

Upvotes: 2

Related Questions