How to extract non-nans from pandas DataFrame

Question

I have read through many posts but I am still not able to figure out the following. How can I extract non-nans from a pandas DataFrame? Eg.:

import pandas as pd
import numpy as np

tmp = pd.DataFrame({'Women': {0: 28, 1: 42, 2: 52, 3: 62},
 'Men': {0: 46.0, 1: 55.0, 2: np.nan, 3: np.nan},
 'Girls': {0: 40.0, 1: np.nan, 2: np.nan, 3: np.nan},
 'Boys': {0: 47.0, 1: 49.0, 2: 35.1, 3: np.nan}}

)

The output would be a list or dict looking like:

[[28, 42, 52, 62], [46.0, 55.0], [40.0], [47.0, 49.0, 35.1]]

I can do this with looping over columns and dropping nans:

x=[]
for col in tmp.columns:
    x.append(list(tmp[col].dropna()))

But I was curious if there is some nicer way without the for loop or using some pandas features to make it fast and nice. Ideally a one-liner since this should be in a bigger function so as simple, readible and short as possible :). Thanks

jezrael · Accepted Answer

First idea with list comprehension:

L = [[y for y in x if pd.notna(y)] for x in tmp.T.to_numpy()]
print (L)
[[28.0, 42.0, 52.0, 62.0], [46.0, 55.0], [40.0], [47.0, 49.0, 35.1]]

Without looping is possible, but not sure if better performance if large data:

L = tmp.stack().groupby(level=1, sort=False).agg(list).tolist()
print (L)
[[28.0, 42.0, 52.0, 62.0], [46.0, 55.0], [40.0], [47.0, 49.0, 35.1]]

How to extract non-nans from pandas DataFrame

Answers (1)

Related Questions