HungryMolecule
HungryMolecule

Reputation: 337

How to get rid of values that doesnt meet threshold in pandas Dataframe (then plotting it)

I would like to know how can one plot only dots which are above some threshold in pandas scatter? Lets assume I have dataframe like this - I had two datasets, I calculated difference between them (column "Difference"), then based on whether number in dataset1 was higher I gave it argument "True", if number in dataset2 was higher I gave it argument "False", finally there is atom number ("ATOM"):

import pandas as pd
import numpy as np
d = {'Difference': [1.095842, 1.295069, 1.021345, np.nan, 1.773725], 'ARG': [True, False, True, np.nan, False], 'ATOM': [1, 3, 5, 7, 9]}
df=pd.DataFrame(d)
df

    Difference  ARG ATOM
0   1.095842    True    1
1   1.295069    False   3
2   1.021345    True    5
3     NaN        NaN    7
4   1.773725    False   9

Then I plot the DataFrame to make scatter where the dots are colored based on whether is ARG True or False.

df = df.dropna(axis=0)
df.plot(x='ATOM', y='Difference', kind='scatter', color=df.ARG.map({True: 'orange', False: 'blue'}))

But what if I am only interested in plotting dots(ATOMS) which have "Difference" value above some threshold? E.g. difference >= 1.15? Can I somehow drop all rows when in column "Difference" is value that doesnt meet required threshold? I tried

df = df[df.Difference >= 0.15]

But it returns error: '>=' not supported between instances of 'method' and 'float' Thank you for your suggestions.

Upvotes: 0

Views: 109

Answers (1)

Max
Max

Reputation: 4045

The answer is slicing.

First you address the column you want to use as threshold (df['Difference']) and than slicing with a logical vector (df[lg])

lg = df['Difference'] >= 0.15 # logical vector
df = df[lg]

So in a minimum working example, you get:

import pandas as pd
import numpy as np
# create dummy DataFrame
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=['Col 1','Col 2','Col 4','Col 5'])
# thresholding
lg = df['Col 1'] >= 42
# slicing
df[lg]

len(df)
len(df[lg])

100

56

Upvotes: 1

Related Questions