Ep1c1aN
Ep1c1aN

Reputation: 733

Manipulate histogram in pandas

I have a large DataFrame in pandas. I want to remove certain range of values (not a single value) with have a lower frequency while plotting histogram.

For the image below, let's say I want to remove all the values of the variable of the Dataframe which correspond to count/frequency below 20. Does anyone have any solution to that?

# PR has value between 0 to 1700 
data['PR'].hist(bins = 160) #image on the left
data_openforest['PR'].hist(bins = 160) #image on the right

enter image description here enter image description here

Upvotes: 1

Views: 616

Answers (1)

steamdragon
steamdragon

Reputation: 1170

Using pd.cut like this should work:

out = pd.cut(data_openforest['PR'], bins=160)
counts = out.value_counts(sort=False)
counts[counts > 20].plot.bar()
plt.show()

If you want to filter your DataFrame, you have to do this:

data_openforest['bin'] = pd.cut(data_openforest['PR'], bins=160)
bin_freq = data_openforest.groupby('bin').count()
data_openforest = data_openforest.merge(bin_freq, 
                                        on='bin', 
                                        how='left',
                                        suffixes=("_bin", 
                                                  "_bin_freq"))

And then you can easily filter your DataFrame. You will then have to do a bar plot, not a hist.

Upvotes: 2

Related Questions