Reputation: 267
I have the following df:
mask = df.apply(lambda row: True if row.isnull().any() else False, axis=1)
df[HAS_MISSING_VALUES] = mask
date kpi1 kpi2 kpi3 location has_missing_values
0 2019-01-25 147.0 nan 82.876859 team1 True
1 2019-01-26 147.0 41.0 71.178657 team1 False
2 2019-01-27 nan 42.0 81.812117 team2 True
3 2019-01-28 147.0 42.0 75.754279 team2 False
4 2019-01-29 nan 42.0 nan team4 True
5 2019-01-30 nan nan nan team4 True
I need to create a new column that for each row contains its null values names.
so for example the output would be:
date kpi1 kpi2 kpi3 location missing_kpis
0 2019-01-25 147.0 nan 82.876859 team1 [kpi2]
1 2019-01-26 147.0 41.0 71.178657 team1 []
2 2019-01-27 nan 42.0 81.812117 team2 [kpi1]
3 2019-01-28 147.0 42.0 75.754279 team2 []
4 2019-01-29 nan 42.0 nan team4 [kpi1,kpi3]
5 2019-01-30 nan nan nan team4 [kpi1,kpi2,kpi3]
but been stuck from there
Upvotes: 1
Views: 2636
Reputation: 5459
You can use isnull
with apply
.
df['missing_kpis'] = df.apply(lambda x: ','.join(x[x.isnull()].index),axis=1)
Test
df
date kpi1 kpi2 kpi3 location
0 2019-01-25 147.0 NaN 82.876859 team1
1 2019-01-26 147.0 41.0 71.178657 team1
2 2019-01-27 NaN 42.0 81.812117 team2
3 2019-01-28 147.0 42.0 75.754279 team2
4 2019-01-29 NaN 42.0 NaN team4
5 2019-01-30 NaN NaN NaN team4
df['missing_kpis'] = df.apply(lambda x: ','.join(x[x.isnull()].index),axis=1)
df
date kpi1 kpi2 kpi3 location missing_kpis
0 2019-01-25 147.0 NaN 82.876859 team1 kpi2
1 2019-01-26 147.0 41.0 71.178657 team1
2 2019-01-27 NaN 42.0 81.812117 team2 kpi1
3 2019-01-28 147.0 42.0 75.754279 team2
4 2019-01-29 NaN 42.0 NaN team4 kpi1,kpi3
5 2019-01-30 NaN NaN NaN team4 kpi1,kpi2,kpi3
Upvotes: 3
Reputation: 510
This is what I came up with, hope this answers your question
missing = []
for row in data.index:
missing.append(list(data.iloc[row][data.iloc[row].isnull()].index))
data['Has_Missing_values'] = missing
print(data)
Upvotes: 2