adamsorbie
adamsorbie

Reputation: 53

Drop rows in pandas conditionally based on values across all columns

I have a count matrix which represents % abundance, with samples as columns and observations as rows e.g:

#OTUId  101.BGd_295  103.BGd_309  105.BGd_310  11.BGd_99   123.BGd_312  
OTU_200 0.016806723  0.23862789   0.148210883  0.6783      0.126310471  
OTU_54  0.253542133  0.169383866  0            0.113679432 0.173943294
OTU_2   0.033613445  16.58463833  19.66970146  16.06669119 20.92537833  

I am trying to filter the dataframe using pandas, keeping only those rows which have at least one value above 0.5%. I initially found this

df = df[(df > 0.5).sum(axis=1) >= 1]

which i thought would do the trick but now as far as I understand this will instead keep those in which the sum across the row is greater than 0.5. How can I modify this to suit?

thanks!

Upvotes: 0

Views: 410

Answers (1)

jezrael
jezrael

Reputation: 862581

I think simplier solution is use condition for boolean DataFrame and then check by any for at least one True per row, last filter by boolean indexing:

print (df.drop('#OTUId',axis=1) > 0.5)
   101.BGd_295  103.BGd_309  105.BGd_310  11.BGd_99  123.BGd_312
0        False        False        False       True        False
1        False        False        False      False        False
2        False         True         True       True         True

print ((df.drop('#OTUId',axis=1) > 0.5).any(axis=1))
0     True
1    False
2     True
dtype: bool

df = df[(df.drop('#OTUId',axis=1) > 0.5).any(axis=1)]
print (df)
    #OTUId  101.BGd_295  103.BGd_309  105.BGd_310  11.BGd_99  123.BGd_312
0  OTU_200     0.016807     0.238628     0.148211   0.678300     0.126310
2    OTU_2     0.033613    16.584638    19.669701  16.066691    20.925378

Your code:

df = df[(df > 0.5).sum(axis=1) >= 1]

#boolean mask
print (df > 0.5)
   #OTUId  101.BGd_295  103.BGd_309  105.BGd_310  11.BGd_99  123.BGd_312
0    True        False        False        False       True        False
1    True        False        False        False      False        False
2    True        False         True         True       True         True

#count True values per row
print ((df > 0.5).sum(axis=1))
0    2
1    1
2    5
dtype: int64

#check values by condition
print ((df > 0.5).sum(axis=1) >= 1)
0    True
1    True
2    True
dtype: bool

Upvotes: 2

Related Questions