Reputation: 89
I'm using the Kolmogorov-Smirnov test in MATLAB to determine the normality of each column of a data matrix prior to performing generalised linear regression. An example data vector is:
data = [8126,3163,9129,5399,8682,1126,1053,7805,2989,2758,3277,1152,6994,6833];
The test runs and gives me a result. However, when I plot the empirical cumulative distribution function (cdf) (blue) and the standard normal cdf (red) for a visual comparison, the scale of such a data vector is such that the graph is not useful:
The code used to plot this figure is:
[h,p,ksstat,cv] = kstest(data);
[f,x_values] = ecdf(data);
figure()
F = plot(x_values,f);
set(F,'LineWidth',2);
hold on
G = plot(x_values,normcdf(x_values,0,1),'r-');
set(G,'LineWidth',2);
legend([F G],...
'Empirical CDF','Standard Normal CDF',...
'Location','SE');
Does this mean the result of my test is not valid? If yes, can I just normalise the data e.g.
dataN=(data-min(data))./(max(data)-min(data));
while maintaining test validity?
Thank you for your time,
Laura
Upvotes: 3
Views: 1049
Reputation: 89
Thanks to Luis Mendo I solved this problem. normcdf
requires the mean and standard deviation of the data vector as inputs, which I had not changed from the example code I was working from. The edited code is:
[h,p,ksstat,cv] = kstest(data);
[f,x_values] = ecdf(data);
figure()
F = plot(x_values,f);
set(F,'LineWidth',2);
hold on
variableMean = mean(data);
variableSD = std(data);
G = plot(x_values,normcdf(x_values,variableMean,variableSD),'r-');
set(G,'LineWidth',2);
legend([F G],...
'Empirical CDF','Standard Normal CDF',...
'Location','SE');
Upvotes: 3