Laura
Laura

Reputation: 89

Kolmogorov-Smirnov test for normality in MATLAB - data normalisation?

I'm using the Kolmogorov-Smirnov test in MATLAB to determine the normality of each column of a data matrix prior to performing generalised linear regression. An example data vector is:

data = [8126,3163,9129,5399,8682,1126,1053,7805,2989,2758,3277,1152,6994,6833];

The test runs and gives me a result. However, when I plot the empirical cumulative distribution function (cdf) (blue) and the standard normal cdf (red) for a visual comparison, the scale of such a data vector is such that the graph is not useful:

exampleCDF

The code used to plot this figure is:

[h,p,ksstat,cv] = kstest(data);
[f,x_values] = ecdf(data);
figure()
F = plot(x_values,f);
set(F,'LineWidth',2);
hold on
G = plot(x_values,normcdf(x_values,0,1),'r-');
set(G,'LineWidth',2);
legend([F G],...
    'Empirical CDF','Standard Normal CDF',...
    'Location','SE');

Does this mean the result of my test is not valid? If yes, can I just normalise the data e.g.

dataN=(data-min(data))./(max(data)-min(data)); 

while maintaining test validity?

Thank you for your time,

Laura

Upvotes: 3

Views: 1049

Answers (1)

Laura
Laura

Reputation: 89

Thanks to Luis Mendo I solved this problem. normcdf requires the mean and standard deviation of the data vector as inputs, which I had not changed from the example code I was working from. The edited code is:

[h,p,ksstat,cv] = kstest(data);
[f,x_values] = ecdf(data);
figure()
F = plot(x_values,f);
set(F,'LineWidth',2);
hold on
variableMean = mean(data);
variableSD = std(data);
G = plot(x_values,normcdf(x_values,variableMean,variableSD),'r-');
set(G,'LineWidth',2);
legend([F G],...
    'Empirical CDF','Standard Normal CDF',...
    'Location','SE');

Upvotes: 3

Related Questions