Reputation: 337
I would like to know how can one plot only dots which are above some threshold in pandas scatter? Lets assume I have dataframe like this - I had two datasets, I calculated difference between them (column "Difference"), then based on whether number in dataset1 was higher I gave it argument "True", if number in dataset2 was higher I gave it argument "False", finally there is atom number ("ATOM"):
import pandas as pd
import numpy as np
d = {'Difference': [1.095842, 1.295069, 1.021345, np.nan, 1.773725], 'ARG': [True, False, True, np.nan, False], 'ATOM': [1, 3, 5, 7, 9]}
df=pd.DataFrame(d)
df
Difference ARG ATOM
0 1.095842 True 1
1 1.295069 False 3
2 1.021345 True 5
3 NaN NaN 7
4 1.773725 False 9
Then I plot the DataFrame to make scatter where the dots are colored based on whether is ARG True or False.
df = df.dropna(axis=0)
df.plot(x='ATOM', y='Difference', kind='scatter', color=df.ARG.map({True: 'orange', False: 'blue'}))
But what if I am only interested in plotting dots(ATOMS) which have "Difference" value above some threshold? E.g. difference >= 1.15? Can I somehow drop all rows when in column "Difference" is value that doesnt meet required threshold? I tried
df = df[df.Difference >= 0.15]
But it returns error: '>=' not supported between instances of 'method' and 'float' Thank you for your suggestions.
Upvotes: 0
Views: 109
Reputation: 4045
The answer is slicing.
First you address the column you want to use as threshold (df['Difference']
) and than slicing with a logical vector (df[lg]
)
lg = df['Difference'] >= 0.15 # logical vector
df = df[lg]
So in a minimum working example, you get:
import pandas as pd
import numpy as np
# create dummy DataFrame
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=['Col 1','Col 2','Col 4','Col 5'])
# thresholding
lg = df['Col 1'] >= 42
# slicing
df[lg]
len(df)
len(df[lg])
100
56
Upvotes: 1