Reputation: 533
Can anyone give me a way to do a qq plot in Seaborn as a test for normality of data? Or failing that, at least in matplotlib.
Thanks in advance
Upvotes: 10
Views: 38327
Reputation: 48
At seaborn-qqplot addon documentation an example is shown. Also see.
Working with pycharm and windows 10 I had difficulties installing the library with:
pip install seaborn-qqplot
in my virtual environment. The import line:
from seaborn_qqplot import pplot
was not recognized.
With (commands for PyCharm): file -> settings -> Project -> Python Interpreter -> + (Install) I could import pplot from seaborn_qqplot and could create a Quantile - Quantile plot.
Upvotes: 0
Reputation: 196
Using same data as above, this example shows a normal distribution plotted against a normal distribution, resulting in fairly straight line:
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
a = np.random.normal(5, 5, 250)
sm.qqplot(a)
plt.show()
This example shows a Rayleigh distribution plotted against normal distribution, resulting in a slightly concave curve:
a = np.random.rayleigh(5, 250)
sm.qqplot(a)
plt.show()
Upvotes: 12
Reputation: 1263
I'm not sure if this still recent, but I notice that neither of the answers really addresses the question, which asks how to do qq-plots with scipy and seaborn, but doesn't mention statsmodels. In fact, qq-plots are available in scipy under the name probplot:
from scipy import stats
import seaborn as sns
stats.probplot(x, plot=sns.mpl.pyplot)
The plot argument to probplot can be anything that has a plot
method and a text
method. Probplot is also quite flexible about the kinds of theoretical distributions it supports.
Upvotes: 9
Reputation: 339795
After reading the wikipedia article, I understand that the Q-Q plot is a plot of the quantiles of two distributions against each other.
numpy.percentile
allows to obtain the percentile of a distribution. Hence you can call numpy.percentile
on each of the distributions and plot the results against each other.
import numpy as np
import matplotlib.pyplot as plt
a = np.random.normal(5,5,250)
b = np.random.rayleigh(5,250)
percs = np.linspace(0,100,21)
qn_a = np.percentile(a, percs)
qn_b = np.percentile(b, percs)
plt.plot(qn_a,qn_b, ls="", marker="o")
x = np.linspace(np.min((qn_a.min(),qn_b.min())), np.max((qn_a.max(),qn_b.max())))
plt.plot(x,x, color="k", ls="--")
plt.show()
Upvotes: 22