astroluv
astroluv

Reputation: 793

How to create a new column containing names of columns that are Nan with pandas?

If i've a dataframe like this:

   A     B      C
 Nan   1.0    0.0
 1.0   Nan    1.0
 1.0   0.0    Nan

I want to create a new column in the dataframe that will provide info about which column in each row contains contains nan values.

   A     B      C     Col4

 Nan   1.0    Nan     A,C  
 1.0   Nan    1.0     B
 1.0   Nan    Nan     B,C

Any help?

Upvotes: 1

Views: 36

Answers (2)

prosti
prosti

Reputation: 46479

Naive approach:

def f(r):
  ret=[]
  if(r['A']=='Nan'): ret.append('A')
  if(r['B']=='Nan'): ret.append('B')
  if(r['C']=='Nan'): ret.append('C')    
  return ','.join(ret)

df['D'] = df.apply(f, axis=1)

print(df)

     A    B    C
0  Nan  1.0  Nan
1  1.0  Nan  1.0
2  1.0  Nan  Nan
     A    B    C    D
0  Nan  1.0  Nan  A,C
1  1.0  Nan  1.0    B
2  1.0  Nan  Nan  B,C

I tested on strings but you can replace that with np.nan.

Upvotes: 1

jezrael
jezrael

Reputation: 863651

Compare by DataFrame.isna and use DataFrame.dot with columns names, last remove last , by Series.str.rstrip:

df['col4'] = df.isna().dot(df.columns + ',').str.rstrip(',')
#if values are strings Nan
#df['col4'] = df.eq('Nan').dot(df.columns + ',').str.rstrip(',')
print (df)
     A    B    C col4
0  NaN  1.0  NaN  A,C
1  1.0  NaN  1.0    B
2  1.0  NaN  NaN  B,C

Upvotes: 3

Related Questions