yovel cohen
yovel cohen

Reputation: 267

Pandas, list all the columns that have null values for each row

I have the following df:

mask = df.apply(lambda row: True if row.isnull().any() else False, axis=1)
df[HAS_MISSING_VALUES] = mask
         date      kpi1       kpi2      kpi3      location has_missing_values
0  2019-01-25      147.0      nan       82.876859   team1   True
1  2019-01-26      147.0      41.0      71.178657   team1   False
2  2019-01-27      nan        42.0      81.812117   team2   True
3  2019-01-28      147.0      42.0      75.754279   team2   False
4  2019-01-29      nan        42.0      nan         team4   True
5  2019-01-30      nan        nan       nan         team4   True

I need to create a new column that for each row contains its null values names.

so for example the output would be:

         date      kpi1       kpi2      kpi3      location missing_kpis
0  2019-01-25      147.0      nan       82.876859   team1  [kpi2]
1  2019-01-26      147.0      41.0      71.178657   team1  []
2  2019-01-27      nan        42.0      81.812117   team2  [kpi1]
3  2019-01-28      147.0      42.0      75.754279   team2  []
4  2019-01-29      nan        42.0      nan         team4  [kpi1,kpi3]
5  2019-01-30      nan        nan       nan         team4  [kpi1,kpi2,kpi3]

but been stuck from there

Upvotes: 1

Views: 2636

Answers (2)

Arun Palanisamy
Arun Palanisamy

Reputation: 5459

You can use isnull with apply.

df['missing_kpis'] = df.apply(lambda x: ','.join(x[x.isnull()].index),axis=1)

Test

df
         date   kpi1  kpi2       kpi3 location
0  2019-01-25  147.0   NaN  82.876859    team1
1  2019-01-26  147.0  41.0  71.178657    team1
2  2019-01-27    NaN  42.0  81.812117    team2
3  2019-01-28  147.0  42.0  75.754279    team2
4  2019-01-29    NaN  42.0        NaN    team4
5  2019-01-30    NaN   NaN        NaN    team4
df['missing_kpis'] = df.apply(lambda x: ','.join(x[x.isnull()].index),axis=1)
df
         date   kpi1  kpi2       kpi3 location    missing_kpis
0  2019-01-25  147.0   NaN  82.876859    team1            kpi2
1  2019-01-26  147.0  41.0  71.178657    team1                
2  2019-01-27    NaN  42.0  81.812117    team2            kpi1
3  2019-01-28  147.0  42.0  75.754279    team2                
4  2019-01-29    NaN  42.0        NaN    team4       kpi1,kpi3
5  2019-01-30    NaN   NaN        NaN    team4  kpi1,kpi2,kpi3

Upvotes: 3

Mohil Patel
Mohil Patel

Reputation: 510

This is what I came up with, hope this answers your question

missing = []
for row in data.index:
  missing.append(list(data.iloc[row][data.iloc[row].isnull()].index))

data['Has_Missing_values'] = missing
print(data)

dataframe

Upvotes: 2

Related Questions