Reputation: 6483
I am trying to get scipy.stats.probplot to plot a QQplot with a custom distribution. Basically I have a bunch of numeric variables (all numpy arrays) and I want to check distributional differences with a QQplot.
My dataframe df
looks something like this:
some_var another_var
1 16.5704 3.3620
2 12.8373 -8.2204
3 8.1854 1.9617
4 13.5683 1.8376
5 8.5143 2.3173
6 6.0123 -7.7536
7 9.6775 -4.3874
... ... ...
189499 11.8561 -8.4887
189500 10.0422 -4.6228
According to the reference:
dist : str or stats.distributions instance, optional
Distribution or distribution function name. The default is ‘norm’ for a normal probability plot. Objects that look enough like a stats.distributions instance (i.e. they have a ppf
method) are also accepted.
Of course a numpy array doesn't have the ppf
method, so when I try the following:
import scipy.stats as stats
stats.probplot(X[X.columns[1]].values, dist=X[X.columns[2]].values, plot=pylab)
I get the following error:
AttributeError: 'numpy.ndarray' object has no attribute 'ppf'
(N.B. if I do not use the .values
method I would get the same error but for a 'Series' object instead of 'numpy.ndarry')
So, the question is: what is an object with a ppf
method and how do I create it from my numpy array?
Upvotes: 0
Views: 3726
Reputation: 68186
The "dist" object should be an instance or class of scipy's statistical distributions. That is what is meant by:
dist : str or stats.distributions instance, optional
So a self-contained example would be:
import numpy
from matplotlib import pyplot
from scipy import stats
random_beta = numpy.random.beta(0.3, 2, size=37)
fig, ax = pyplot.subplots(figsize=(6, 3))
_ = stats.probplot(
random_beta, # data
sparams=(0.3, 2), # guesses at the distribution's parameters
dist=stats.beta, # the "dist" object
plot=ax # where the data should be plotted
)
And you'll get:
If you want to plot multiple columns of a data frame, you'll need to call probplot
multiple times, plotting on the same (or new) axes each time.
In this simple case, the probscale package doesn't offer much. But it might be more flexible for doing probability scales instead of quantile scales if that's a direction you might head in the future:
import probscale
fig, ax = pyplot.subplots(figsize=(6, 3))
fig = probscale.probplot(
random_beta,
ax=ax,
plottype='qq',
bestfit=True,
dist=stats.beta(0.3, 2)
)
Upvotes: 2