pranav nerurkar
pranav nerurkar

Reputation: 648

Finding rows with non null values in the same columns

Suppose i have a dataframe

import numpy
import pandas 
ar = numpy.array([[1.2, 2.1, 0, 1.3], [0, 0, 1.2, 1.5], [2.1, 1.3, 0, 5.6],  [0, 0, 2.4, 3.7]])
df = pandas.DataFrame(ar, index = ['0', '1', '2', '3'], columns = ['colA', 'colB', 'colC', 'colD'])



index colA  colB  colC  colD ......
0      1.2   2.1   0    1.3
1      0      0    1.2   1.5
2      2.1    1.3  0     5.6
3      0      0    2.4   3.7
4

i want to get rows having non null values in the same columns like row 0 and 2 then row 1 and 3 as they have non null values in the same columns. how do i do this? The nonnull values can be different

Expected output

 df1 =   index colA  colB  colC  colD ......
    0      1.2   2.1   0    1.3
    2      2.1    1.3  0     5.6
    
   

df2 =    index colA  colB  colC  colD 
   
    1      0      0    1.2   1.5
    3      0      0    2.4   3.7

Upvotes: 1

Views: 45

Answers (1)

jezrael
jezrael

Reputation: 862641

Missing values are not 0, so intead df.isna() is used df.eq(0), then boolean values are converted to strings and summed and passed to DataFrame.groupby:

g = df.eq(0).astype(str).sum(axis=1)
#alternative solution
#g = df.eq(0).apply(tuple, axis=1)

for i, g in df.groupby(g):
    print (g)
   colA  colB  colC  colD
0   1.2   2.1   0.0   1.3
2   2.1   1.3   0.0   5.6
   colA  colB  colC  colD
1   0.0   0.0   1.2   1.5
3   0.0   0.0   2.4   3.7

For list of DataFrames use:

dfs = [g for i, g in df.groupby(g)]

And then is possible select by indexing:

print (dfs[0])
print (dfs[1])

Upvotes: 2

Related Questions