Reputation: 113
I have a pandas data frame that looks like this:
a b c
0 NaN 2.0 165.0
1 NaN 9.0 NaN
2 NaN NaN NaN
3 15.0 15.0 NaN
4 5.0 NaN 11.0
I would like to add a column that gives me something like a summary of missing values. So, I need a command which gives me the list of columns with missing values for every row. Something like this:
a b c summary
0 NaN 2.0 165.0 a
1 NaN 9.0 NaN a + c
2 NaN NaN NaN a + b + c
3 15.0 15.0 NaN c
4 5.0 NaN 11.0 b
Upvotes: 3
Views: 588
Reputation: 164613
Here is one way.
import pandas as pd
import numpy as np
df = pd.DataFrame(
[
[np.nan, 2, 165],
[np.nan, 9, np.nan],
[np.nan, np.nan, np.nan],
[15, 15, np.nan],
[5, np.nan, 11]
],
columns=['a', 'b', 'c']
)
df['Errors'] = df.apply(lambda row: ' + '.join(i for i in ['a', 'b', 'c'] if np.isnan(row[i])), axis=1)
Upvotes: 3