Reputation: 162

how to create a qq plot between two samples of different size in python?

I got an original sample data and its simulated data (don't ask me how I simulated), and I want to check if histograms are matching. So the best way is by qqplot but statsmodels library does not allow samples with different size.

Upvotes: 1

Answers (1)

Emma

Reputation: 1297

Constructing a qq plot involves finding corresponding quantiles in both sets and plotting them against one another. In the case where one set is larger than the other, common practice is to take the quantile levels of the smaller set, and use linear interpolation to estimate the corresponding quantiles in the larger set. This is described here: http://www.itl.nist.gov/div898/handbook/eda/section3/qqplot.htm

This is relatively straightforward to do manually:

import numpy as np
import pylab

test1 = np.random.normal(0, 1, 1000)
test2 = np.random.normal(0, 1, 800)

#Calculate quantiles
test1.sort()
quantile_levels1 = np.arange(len(test1),dtype=float)/len(test1)

test2.sort()
quantile_levels2 = np.arange(len(test2),dtype=float)/len(test2)

#Use the smaller set of quantile levels to create the plot
quantile_levels = quantile_levels2

#We already have the set of quantiles for the smaller data set
quantiles2 = test2

#We find the set of quantiles for the larger data set using linear interpolation
quantiles1 = np.interp(quantile_levels,quantile_levels1,test1)

#Plot the quantiles to create the qq plot
pylab.plot(quantiles1,quantiles2)

#Add a reference line
maxval = max(test1[-1],test2[-1])
minval = min(test1[0],test2[0])
pylab.plot([minval,maxval],[minval,maxval],'k-')

pylab.show()

Upvotes: 7

how to create a qq plot between two samples of different size in python?

Answers (1)

Related Questions