Micawber
Micawber

Reputation: 707

group rows according to a column in a pandas df (fill with only boolean values)

I'm struggling with an issue on my pandas DataFrame. i guess I should use the groupby method but I can't figure it out properly.

my data looks like this (but with ~200 rows and 5000 columns) :

            K00001  K00002  K00003  K00004  K00005  K00009  K00011  K00013   OTU
Root100     True    False   False   True    False   False   True    False    OTU1
Root102     True    False   False   True    False   False   True    False    OTU1
Root105     True    True    False   True    False   False   True    False    OTU1
Root107     True    False   False   True    False   False   True    False    OTU2
Root11      True    False   False   True    True    False   True    False    OTU2

i'd like to group the rows according to the last column column 'OTU' in order to have :

        K00001  K00002  K00003  K00004  K00005  K00009  K00011  K00013   
OTU1    True    True    False   True    False   False   True    False    
OTU2    True    False   False   True    True    False   True    False

The boolean values of each cell being the result of or boolean comparisons of each grouped cells (for instance, for K00002 in OTU1, it would be False or False or True = True

Can someone give me a hint ?

Thanks.

Upvotes: 1

Views: 184

Answers (1)

jezrael
jezrael

Reputation: 863291

Use GroupBy.any:

df = df.groupby('OTU').any()
print (df)
      K00001  K00002  K00003  K00004  K00005  K00009  K00011  K00013
OTU                                                                 
OTU1    True    True   False    True   False   False    True   False
OTU2    True   False   False    True    True   False    True   False

Upvotes: 3

Related Questions