Reputation: 793
If i've a dataframe like this:
A B C
Nan 1.0 0.0
1.0 Nan 1.0
1.0 0.0 Nan
I want to create a new column in the dataframe that will provide info about which column in each row contains contains nan values.
A B C Col4
Nan 1.0 Nan A,C
1.0 Nan 1.0 B
1.0 Nan Nan B,C
Any help?
Upvotes: 1
Views: 36
Reputation: 46479
Naive approach:
def f(r):
ret=[]
if(r['A']=='Nan'): ret.append('A')
if(r['B']=='Nan'): ret.append('B')
if(r['C']=='Nan'): ret.append('C')
return ','.join(ret)
df['D'] = df.apply(f, axis=1)
print(df)
A B C
0 Nan 1.0 Nan
1 1.0 Nan 1.0
2 1.0 Nan Nan
A B C D
0 Nan 1.0 Nan A,C
1 1.0 Nan 1.0 B
2 1.0 Nan Nan B,C
I tested on strings but you can replace that with np.nan
.
Upvotes: 1
Reputation: 863651
Compare by DataFrame.isna
and use DataFrame.dot
with columns names, last remove last ,
by Series.str.rstrip
:
df['col4'] = df.isna().dot(df.columns + ',').str.rstrip(',')
#if values are strings Nan
#df['col4'] = df.eq('Nan').dot(df.columns + ',').str.rstrip(',')
print (df)
A B C col4
0 NaN 1.0 NaN A,C
1 1.0 NaN 1.0 B
2 1.0 NaN NaN B,C
Upvotes: 3