Reputation: 25370
I have some image data that I have plotted in a histogram using numpy as shown in the code below. The problem I am having is that the x axis goes up in steps of 1, but the bin width is approximately 1.3 (I roughly calculated by zooming in and looking at the bin width).
This leads to a histogram which looks like this:
As you can see at certain points the histogram goes down to zero. If I zoom in, the points at which the value is 0 are NOT integers. Because my data are integers the number 550.8 will obviously appear 0 times which is causing the histogram to have the appearance above (I think).
I can get around this problem if I increase the number of bins from 100 to 1000. This leaves me with the histogram below:
So I've finally got to my question (apologies for the long post!)... Is there a way to join the bins (when using a large number like I am to get around my initial problem) using np.histogram. I suppose this is just aesthetics and it isn't essential but it would look better.
There are other posts on here which I have looked at, but almost all are using plt.hist
for their histogram as opposed to np.histogram
.
My code:
def histo():
heights,edges = np.histogram(data, bins=100, range=(minvalue,maxvalue))
edges = edges[:-1]+(edges[1]-edges[0]) ### not entirely sure what this line is actually doing
fig, ax = plt.subplots()
ax.plot(edges,heights)
ax.set(title=title, xlabel='ADC Value(DN/40)', ylabel='Frequency')
#do some analysis of the data between two clicks
point1, point2 = fig.ginput(2)
ax.axvspan(point1[0], point2[0], color='blue', alpha=0.5)
mask = (edges>point1[0]) & (edges<point2[0])
## more analysis code ##
data = someimage_data
histo()
Upvotes: 1
Views: 1631
Reputation: 35136
As you suspect it yourself, the problem is that your integer data need custom-fit bins to get a pretty histogram. As a matter of fact, this is usually true for histograms.
Consider the following reconstruction of your problem:
import numpy as np
# generate data
data = np.floor(np.random.randn(10000)*20+620)
data = dat[(560<dat) & (dat<650)]
# do what you're doing
heights,edges = np.histogram(data, bins=100, range=(data.min(),data.max()))
edges = edges[:-1]+(edges[1]-edges[0]) # shift first x coordinate to edges[1]
# and drop last point: 1 more edge than bins
fig, ax = plt.subplots()
ax.plot(edges,heights)
The result is convincingly ugly:
The problem is that you're using 100 bins, but your integer values are between 560 and 650: this means that a few bins will certainly be empty!
One easy solution is to set a slightly smaller bin count than the number of your possible unique integer values:
# do what you're doing
range = [data.min(),data.max()]
heights,edges = np.histogram(data, bins=np.ceil((range[1]-range[0])*0.95), range=range)
edges = edges[:-1]+(edges[1]-edges[0]) # shift first x coordinate to edges[1]
fig, ax = plt.subplots()
ax.plot(edges,heights)
It's getting better:
but clearly there are artifacts from the fact that a few bins contain multiple integers, while others don't. This is a less shocking instance of the original problem.
The ultimate solution is to use tailor-made bins to your problem: use an array_like
variable for bins, each containing a single integer. I suggest using an np.arange()
, shifted down by 0.5
:
# do what you're doing
range = [data.min(),data.max()]
bins = np.arange(range[0],range[1]+2) - 0.5
heights,edges = np.histogram(data, bins=bins, range=range)
edges = edges[:-1]+(edges[1]-edges[0]) # shift first x coordinate to edges[1]
fig, ax = plt.subplots()
ax.plot(edges,heights)
And it's pretty as can be!
Upvotes: 3