Points outside histogram range are discarted from plot

Question

I am plotting a histogram of values and I want all histograms to have the same range of values for the bins, so plots can be compared. To do so, I specify a vector x with the values and range of each bin.

data = np.array([0.1, 0.1, 0.2, 0.2, 0.2, 0.32])
x = np.linspace(0, 0.2, 9)
plt.hist(data, x)

What I notice is that if I specify the range of x to be between 0 and 0.2 then values larger than 0.2 (0.32 in the example) are discarted from the plot.

Is there a way of accumulating all values greater than 0.2 in the last bin and all values lower than 0.0 in the first bin?

Of course I can do something like

data[data>0.2] = 0.2
data[data<0.0] = 0.0

But I'd prefer not to modify my original array and not have to make a copy of it unless there isn't another way.

James · Accepted Answer

You can pass the bins argument as an array with demarcation wherever you want. It does not have to be linearly spaced. This will make the bars of different widths though. For your particular case, you can use the .clip method of the data array.

data = np.array([0.1, 0.1, 0.2, 0.2, 0.2, 0.32])
x = np.linspace(0, 0.2, 9)
plt.hist(data.clip(min=0, max=0.2), x)

Points outside histogram range are discarted from plot

Answers (1)

Related Questions