Reputation: 353
I have data as a list of floats and I want to plot it as a histogram. Hist() function does the job perfectly for plotting the absolute histogram. However, I cannot figure out how to represent it in a relative frequency format - I would like to have it as a fraction or ideally as a percentage on the y-axis.
Here is the code:
fig = plt.figure()
ax = fig.add_subplot(111)
n, bins, patches = ax.hist(mydata, bins=100, normed=1, cumulative=0)
ax.set_xlabel('Bins', size=20)
ax.set_ylabel('Frequency', size=20)
ax.legend
plt.show()
I thought normed=1 argument would do it, but it gives fractions that are too high and sometimes are greater than 1. They also seem to depend on the bin size, as if they are not normalized by the bin size or something. Nevertheless, when I set cumulative=1, it nicely sums up to 1. So, where is the catch? By the way, when I feed the same data into Origin and plot it, it gives me perfectly correct fractions. Thank you!
Upvotes: 35
Views: 61647
Reputation: 59
You can use numpy.histogram
to get the histogram value and bins, and then calculate frequency by yourself. Finally, use bar plot
to get the frequency histogram.
hist, edges = np.histogram(p_hat)
freq = hist / float(hist.sum())
width = np.diff(edges) # edges is bins
plt.bar(edges[1:], freq, width=width, align="edge", ec="k")
plt.set(xlabel='x', ylabel='frequency')
Upvotes: 1
Reputation: 1759
For relative frequency format set the option density=True
. The figure below shows a histogram for 1000 samples taken from a normal distribution with mean 5 and standard deviation 2.0.
The code is
import numpy as np
import matplotlib.pyplot as plt
# Generate data from normal distibution
mu, sigma = 5, 2.0 # mean and standard deviation
mydata = np.random.normal(mu, sigma, 1000)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(mydata,bins=100,density=True);
plt.show()
If you want % on the y-axis you can use PercentFormatter
as shown below
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter
# Generate data from normal distibution
mu, sigma = 5, 2.0 # mean and standard deviation
mydata = np.random.normal(mu, sigma, 1000)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(mydata,bins=100,density=False);
ax.yaxis.set_major_formatter(PercentFormatter(xmax=100))
plt.show()
Upvotes: 0
Reputation: 8538
Because normed option of hist returns the density of points, e.g dN/dx
What you need is something like that:
# assuming that mydata is an numpy array
ax.hist(mydata, weights=np.zeros_like(mydata) + 1. / mydata.size)
# this will give you fractions
Upvotes: 64
Reputation: 35269
Or you can use set_major_formatter
to adjust the scale of the y-axis, as follows:
from matplotlib import ticker as tick
def adjust_y_axis(x, pos):
return x / (len(mydata) * 1.0)
ax.yaxis.set_major_formatter(tick.FuncFormatter(adjust_y_axis))
just call adjust_y_axis
as above before plt.show()
.
Upvotes: 5