Reputation: 2508
I have read through many posts but I am still not able to figure out the following. How can I extract non-nans from a pandas DataFrame? Eg.:
import pandas as pd
import numpy as np
tmp = pd.DataFrame({'Women': {0: 28, 1: 42, 2: 52, 3: 62},
'Men': {0: 46.0, 1: 55.0, 2: np.nan, 3: np.nan},
'Girls': {0: 40.0, 1: np.nan, 2: np.nan, 3: np.nan},
'Boys': {0: 47.0, 1: 49.0, 2: 35.1, 3: np.nan}}
)
The output would be a list
or dict
looking like:
[[28, 42, 52, 62], [46.0, 55.0], [40.0], [47.0, 49.0, 35.1]]
I can do this with looping over columns and dropping nans:
x=[]
for col in tmp.columns:
x.append(list(tmp[col].dropna()))
But I was curious if there is some nicer way without the for loop or using some pandas features to make it fast and nice. Ideally a one-liner since this should be in a bigger function so as simple, readible and short as possible :). Thanks
Upvotes: 0
Views: 72
Reputation: 862396
First idea with list comprehension:
L = [[y for y in x if pd.notna(y)] for x in tmp.T.to_numpy()]
print (L)
[[28.0, 42.0, 52.0, 62.0], [46.0, 55.0], [40.0], [47.0, 49.0, 35.1]]
Without looping is possible, but not sure if better performance if large data:
L = tmp.stack().groupby(level=1, sort=False).agg(list).tolist()
print (L)
[[28.0, 42.0, 52.0, 62.0], [46.0, 55.0], [40.0], [47.0, 49.0, 35.1]]
Upvotes: 3