Reyhaneh
Reyhaneh

Reputation: 409

Plot 2 histograms with different length of data points in one graph using matplotlib

I have two set of data with one containing around 11 million data points and the another around 5000. I would like to plot them both on one histogram. But because of the difference in size I need to normalise the frequency so I can plot them on the same figure. Below I have simulated what I have done with my data to be able to plot them. I have used the normed=True.

from numpy.random import randn
import matplotlib.pyplot as plt
import random

datalist1=[]
for x in range(1,50000):
  datalist1.append(random.uniform(1,2))

datalist2=randn(5000000)


fig= plt.figure(1)

plt.hist(datalist1,bins=20,color='b',alpha=0.3,label='theoretical',histtype='stepfilled', normed=True)
plt.hist(datalist2,bins=20,alpha=0.5,color='g',label='experimental',histtype='stepfilled',normed=True)
plt.xlabel("Value")
plt.ylabel("Normalised Frequency")
plt.legend()
plt.show()

enter image description here

Can you please tell me if this is a good way to get around this issue? I would like to match the tallest hight between the two histogram frequencies to be 1 (or 100%).

Upvotes: 0

Views: 1941

Answers (1)

MB-F
MB-F

Reputation: 23637

The normed=True setting normalizes the histogram to an area of 1. That gives the histogram an interpretation as estimates of probability density functions.

In short, it actually makes sense not to normalize on the peak but on the area.

But if you really want to normalize by height you can modify the polygon data of the histogram:

h = plt.hist(datalist1,bins=20,color='b',alpha=0.3,label='theoretical',histtype='stepfilled', normed=True)
p = h[2][0]
p.xy[:,1] /= p.xy[:, 1].max()
h = plt.hist(datalist2,bins=20,alpha=0.5,color='g',label='experimental',histtype='stepfilled',normed=True)
p = h[2][0]
p.xy[:,1] /= p.xy[:, 1].max()

This solution feels a bit hackish, but at least it's quick and dirty :)

Upvotes: 1

Related Questions