Reputation: 1189
I have 2 sets of data of float numbers, set A and set B. Both of them are matrices of size 40*40. I would like to find out which set is closer to the normal distribution. I know how to use probplot() in matlab to plot the probability of one set. However, I do not know how to find out the level of the fitness of the distribution is.
In python, when people use problot, a parameter ,R^2, shows how good the distribution of the data is against to the normal distribution. The closer the R^2 value to value 1, the better the fitness is. Thus, I can simply use the function to compare two set of data by their R^2 value. However, because of some machine problem, I can not use the python in my current machine. Is there such parameter or function similar to the R^2 value in matlab ?
Thank you very much,
Upvotes: 2
Views: 1100
Reputation: 18570
The approach of @nate (+1) is definitely one possible way of going about this problem. However, the statistician in me is compelled to suggest the following alternative (that does, alas, require the statistics toolbox - but you have this if you have the student version):
Given that your data is Normal (not Multivariate normal), consider using the Jarque-Bera test.
Jarque-Bera tests the null hypothesis that a given dataset is generated by a Normal distribution, versus the alternative that it is generated by some other distribution. If the Jarque-Bera test statistic is less than some critical value, then we fail to reject the null hypothesis.
So how does this help with the goodness-of-fit problem? Well, the larger the test statistic, the more "non-Normal" the data is. The smaller the test statistic, the more "Normal" the data is.
So, assuming you have converted your matrices into two vectors, A
and B
(each should be 1600 by 1 based on the dimensions you provide in the question), you could do the following:
%# Build sample data
A = randn(1600, 1);
B = rand(1600, 1);
%# Perform JB test
[ANormal, ~, AStat] = jbtest(A);
[BNormal, ~, BStat] = jbtest(B);
%# Display result
if AStat < BStat
disp('A is closer to normal');
else
disp('B is closer to normal');
end
As a little bonus of doing things this way, ANormal
and BNormal
tell you whether you can reject or fail to reject the null hypothesis that the sample in A
or B
comes from a normal distribution! Specifically, if ANormal
is 1, then you fail to reject the null (ie the test statistic indicates that A
is probably drawn from a Normal). If ANormal
is 0, then the data in A
is probably not generated from a Normal distribution.
CAUTION: The approach I've advocated here is only valid if A
and B
are the same size, but you've indicated in the question that they are :-)
Upvotes: 1
Reputation: 26069
Fitting a curve or surface to data and obtaining the goodness of fit, i.e., sse, rsquare, dfe, adjrsquare, rmse, can be done using the function fit
. More info here...
Upvotes: 1