Reputation: 162
I got an original sample data and its simulated data (don't ask me how I simulated), and I want to check if histograms are matching. So the best way is by qqplot
but statsmodels
library does not allow samples with different size.
Upvotes: 1
Views: 6067
Reputation: 1297
Constructing a qq plot involves finding corresponding quantiles in both sets and plotting them against one another. In the case where one set is larger than the other, common practice is to take the quantile levels of the smaller set, and use linear interpolation to estimate the corresponding quantiles in the larger set. This is described here: http://www.itl.nist.gov/div898/handbook/eda/section3/qqplot.htm
This is relatively straightforward to do manually:
import numpy as np
import pylab
test1 = np.random.normal(0, 1, 1000)
test2 = np.random.normal(0, 1, 800)
#Calculate quantiles
test1.sort()
quantile_levels1 = np.arange(len(test1),dtype=float)/len(test1)
test2.sort()
quantile_levels2 = np.arange(len(test2),dtype=float)/len(test2)
#Use the smaller set of quantile levels to create the plot
quantile_levels = quantile_levels2
#We already have the set of quantiles for the smaller data set
quantiles2 = test2
#We find the set of quantiles for the larger data set using linear interpolation
quantiles1 = np.interp(quantile_levels,quantile_levels1,test1)
#Plot the quantiles to create the qq plot
pylab.plot(quantiles1,quantiles2)
#Add a reference line
maxval = max(test1[-1],test2[-1])
minval = min(test1[0],test2[0])
pylab.plot([minval,maxval],[minval,maxval],'k-')
pylab.show()
Upvotes: 7