proximacentauri
proximacentauri

Reputation: 1879

Pandas select on multiple columns then replace

I am trying to do a multiple column select then replace in pandas

df:

a  b  c  d  e
0  1  1  0  none
0  0  0  1  none
1  0  0  0  none
0  0  0  0  none

select where any or all of a, b, c, d are non zero

i, j = np.where(df)
s=pd.Series(dict(zip(zip(i, j), 
  df.columns[j]))).reset_index(-1, drop=True)

s:

0   b
0   c
1   d
2   a

Now I want to replace the values in column e by the series:

df['e'] = s.values

so that e looks like:

e:

b, c 
d
a
none

But the problem is that the lengths of the series are different to the number of rows in the dataframe.

Any idea on how I can do this?

Upvotes: 2

Views: 428

Answers (2)

DYZ
DYZ

Reputation: 57105

You can locate the 1's and use their locations as boolean indexes into the dataframe columns:

df['e'] = (df==1).apply(lambda x: df.columns[x], axis=1)\
                 .str.join(",").replace('','none')
#   a  b  c  d     e
#0  0  1  1  0   b,c
#1  0  0  0  1     d
#2  1  0  0  0     a
#3  0  0  0  0  none   

Upvotes: 2

jezrael
jezrael

Reputation: 863291

Use DataFrame.dot for product with columns names, add rstrip, last add numpy.where for replace empty strings to None:

e = df.dot(df.columns + ', ').str.rstrip(', ')
df['e'] = np.where(e.astype(bool), e, None)
print (df)
   a  b  c  d     e
0  0  1  1  0  b, c
1  0  0  0  1     d
2  1  0  0  0     a
3  0  0  0  0  None

Upvotes: 2

Related Questions