Reputation: 733
I have a large DataFrame in pandas
. I want to remove certain range of values (not a single value) with have a lower frequency while plotting histogram.
For the image below, let's say I want to remove all the values of the variable of the Dataframe which correspond to count/frequency below 20. Does anyone have any solution to that?
# PR has value between 0 to 1700
data['PR'].hist(bins = 160) #image on the left
data_openforest['PR'].hist(bins = 160) #image on the right
Upvotes: 1
Views: 616
Reputation: 1170
Using pd.cut like this should work:
out = pd.cut(data_openforest['PR'], bins=160)
counts = out.value_counts(sort=False)
counts[counts > 20].plot.bar()
plt.show()
If you want to filter your DataFrame, you have to do this:
data_openforest['bin'] = pd.cut(data_openforest['PR'], bins=160)
bin_freq = data_openforest.groupby('bin').count()
data_openforest = data_openforest.merge(bin_freq,
on='bin',
how='left',
suffixes=("_bin",
"_bin_freq"))
And then you can easily filter your DataFrame. You will then have to do a bar plot, not a hist.
Upvotes: 2