Reputation: 233
I have a pandas dataframe as follows:
a b c
0 1.0 NaN NaN
1 NaN 7.0 5.0
2 3.0 8.0 3.0
3 4.0 9.0 2.0
4 5.0 0.0 NaN
Is there a simple way to split the dataframe into multiple dataframes based on non-null values?
a
0 1.0
b c
1 7.0 5.0
a b c
2 3.0 8.0 3.0
3 4.0 9.0 2.0
a b
4 5.0 0.0
Upvotes: 13
Views: 1169
Reputation: 323316
Using groupby
with dropna
for _, x in df.groupby(df.isnull().dot(df.columns)):
print(x.dropna(1))
a b c
2 3.0 8.0 3.0
3 4.0 9.0 2.0
b c
1 7.0 5.0
a
0 1.0
a b
4 5.0 0.0
We can save them in dict
d = {y : x.dropna(1) for y, x in df.groupby(df.isnull().dot(df.columns))}
More Info using the dot
to get the null column , if they are same we should combine them together
df.isnull().dot(df.columns)
Out[1250]:
0 bc
1 a
2
3
4 c
dtype: object
Upvotes: 17
Reputation: 31
So here is a possible solution
def getMap(some_list):
return "".join(["1" if np.isnan(x) else "0" for x in some_list])
import pandas as pd
import numpy as np
df = pd.DataFrame([[1, np.NaN, np.NaN], [np.NaN, 7, 5], [3, 8, 3], [4, 9, 2], [5, 0, np.NaN]])
print(df.head())
x = df[[0, 1, 2]].apply(lambda x: x.tolist(), axis=1).tolist()
nullMap = [getMap(y) for y in x]
nullSet = set(nullMap)
some_dict = {y:[] for y in nullSet}
for y in x:
some_dict[getMap(y)] = [*some_dict[getMap(y)], [z for z in y if ~np.isnan(z)]]
dfs = [pd.DataFrame(y) for y in some_dict.values()]
for df in dfs:
print(df)
This gives the exact output for the input you gave. :)
a
1.0
b c
7.0 5.0
a b c
3.0 8.0 3.0
4.0 9.0 2.0
a b
5.0 0.0
Upvotes: 2