Ziva
Ziva

Reputation: 3511

Python: matplotlib - probability mass function as histogram

I want to draw a histogram and a line plot at the same graph. However, to do that I need to have my histogram as a probability mass function, so I want to have on the y-axis a probability values. However, I don't know how to do that, because using the normed option didn't helped. Below is my source code and a sneak peek of used data. I would be very grateful for all suggestions.

data = [12565, 1342, 5913, 303, 3464, 4504, 5000, 840, 1247, 831, 2771, 4005, 1000, 1580, 7163, 866, 1732, 3361, 2599, 4006, 3583, 1222, 2676, 1401, 2598, 697, 4078, 5016, 1250, 7083, 3378, 600, 1221, 2511, 9244, 1732, 2295, 469, 4583, 1733, 1364, 2430, 540, 2599, 12254, 2500, 6056, 833, 1600, 5317, 8333, 2598, 950, 6086, 4000, 2840, 4851, 6150, 8917, 1108, 2234, 1383, 2174, 2376, 1729, 714, 3800, 1020, 3457, 1246, 7200, 4001, 1211, 1076, 1320, 2078, 4504, 600, 1905, 2765, 2635, 1426, 1430, 1387, 540, 800, 6500, 931, 3792, 2598, 5033, 1040, 1300, 1648, 2200, 2025, 2201, 2074, 8737, 324]
plt.style.use('ggplot')
plt.rc('xtick',labelsize=12)
plt.rc('ytick',labelsize=12)
plt.xlabel("Incomes")
plt.hist(data, bins=50, color="blue", alpha=0.5, normed=True)
plt.show() 

Upvotes: 7

Views: 10307

Answers (2)

Tyler Acorn
Tyler Acorn

Reputation: 75

This is old, but since I found it and was about to use it before I noticed some mistakes, I figured I'd add a comment for a couple of fixes I noticed. In the example @mmdanziger uses the bin edges in plt.bar, however, you need to actually use the centers of the bin. Also they assume that the bins are of equal width, which is fine "most" of the time. But you can also pass it an array of widths, which keep you from inadvertently forgetting and making a mistake. So here's a more complete example:

import numpy as np
heights, bins = np.histogram(data, bins=50)
heights = heights/sum(heights)
bin_centers = 0.5*(bins[1:] + bins[:-1])
bin_widths = np.diff(bins)
plt.bar(bin_centers, heights, width=bin_widths, color="blue", alpha=0.5)

@mmdanziger other option of passing weights = np.ones_like(data)/len(data) to plt.hist() also does the same thing, and for many is an easier approach.

Upvotes: 1

mmdanziger
mmdanziger

Reputation: 4658

As far as I know, matplotlib does not have this function built-in. However, it is easy enough to replicate

    import numpy as np
    heights,bins = np.histogram(data,bins=50)
    heights = heights/sum(heights)
    plt.bar(bins[:-1],heights,width=(max(bins) - min(bins))/len(bins), color="blue", alpha=0.5)

Edit: Here is another approach from a similar question:

     weights = np.ones_like(data)/len(data)
     plt.hist(data, bins=50, weights=weights, color="blue", alpha=0.5, normed=False) 

Upvotes: 11

Related Questions