fpe
fpe

Reputation: 2750

manipulate data to better fit a Gaussian Distribution

I have got a question concerning normal distribution (with mu = 0 and sigma = 1).

Let say that I firstly call randn or normrnd this way

x = normrnd(0,1,[4096,1]); % x = randn(4096,1)

Now, to assess how good x values fit the normal distribution, I call

[a,b] = normfit(x);

and to have a graphical support

histfit(x)

Now come to the core of the question: if I am not satisfied enough on how x fits the given normal distribution, how can I optimize x in order to better fit the expected normal distribution with 0 mean and 1 standard deviation?? Sometimes because of the few representation values (i.e. 4096 in this case), x fits really poorly the expected Gaussian, so that I wanna manipulate x (linearly or not, it does not really matter at this stage) in order to get a better fitness.

I'd like remarking that I have access to the statistical toolbox.

EDIT

  1. I made the example with normrnd and randn cause my data are supposed and expected to have normal distribution. But, within the question, those functions are only helpful to better understand my concern.

  2. Would it be possible to appy a least-squares fitting?

  3. Generally the distribution I get is similar to the following: enter image description here

My

Upvotes: 5

Views: 1617

Answers (2)

tashuhka
tashuhka

Reputation: 5126

Maybe, you can try to normalize your input data to have mean=0 and sigma=1. Like this:

y=(x-mean(x))/std(x);

Upvotes: 3

Memming
Memming

Reputation: 1739

If you are searching for a nonlinear transformation that would make your distribution look normal, you can first estimate the cumulative distribution, then take the function composition with the inverse of standard normal CDF. This way you can transform almost any distribution to a normal through invertible transformation. Take a look at the example code below.

x = randn(1000, 1) + 4 * (rand(1000, 1) < 0.5); % some funky bimodal distribution
xr = linspace(-5, 9, 2000);
cdf = cumsum(ksdensity(x, xr, 'width', 0.5)); cdf = cdf / cdf(end); % you many want to use a better smoother
c = interp1(xr, cdf, x); % function composition step 1
y = norminv(c); % function composition step 2
% take a look at the result
figure;
subplot(2,1,1); hist(x, 100);
subplot(2,1,2); hist(y, 100);

Upvotes: 1

Related Questions