Reputation: 648
Suppose i have a dataframe
import numpy
import pandas
ar = numpy.array([[1.2, 2.1, 0, 1.3], [0, 0, 1.2, 1.5], [2.1, 1.3, 0, 5.6], [0, 0, 2.4, 3.7]])
df = pandas.DataFrame(ar, index = ['0', '1', '2', '3'], columns = ['colA', 'colB', 'colC', 'colD'])
index colA colB colC colD ......
0 1.2 2.1 0 1.3
1 0 0 1.2 1.5
2 2.1 1.3 0 5.6
3 0 0 2.4 3.7
4
i want to get rows having non null values in the same columns like row 0 and 2 then row 1 and 3 as they have non null values in the same columns. how do i do this? The nonnull values can be different
Expected output
df1 = index colA colB colC colD ......
0 1.2 2.1 0 1.3
2 2.1 1.3 0 5.6
df2 = index colA colB colC colD
1 0 0 1.2 1.5
3 0 0 2.4 3.7
Upvotes: 1
Views: 45
Reputation: 862641
Missing values are not 0
, so intead df.isna()
is used df.eq(0)
, then boolean values are converted to strings and summed and passed to DataFrame.groupby
:
g = df.eq(0).astype(str).sum(axis=1)
#alternative solution
#g = df.eq(0).apply(tuple, axis=1)
for i, g in df.groupby(g):
print (g)
colA colB colC colD
0 1.2 2.1 0.0 1.3
2 2.1 1.3 0.0 5.6
colA colB colC colD
1 0.0 0.0 1.2 1.5
3 0.0 0.0 2.4 3.7
For list of DataFrames use:
dfs = [g for i, g in df.groupby(g)]
And then is possible select by indexing:
print (dfs[0])
print (dfs[1])
Upvotes: 2