Megan
Megan

Reputation: 669

Gaussian Distribution -- Unexpected distribution

I am new to the Gaussian Distribution.

But a little strange here.

Is there any idea what kind of occurrence could cause an odd bin between -1 and -0.5 to occur?enter image description here

Here is my code(But I am not sure whether it help or not)

mu_0 = 0.178950369
srd_0 = 0.455387161

## aa is a list of float value
aa_list = np.array(aa)
data = aa_list * srd_0 + mu_0

data = data.reshape(-1, 1)

figure(num=None, figsize=(12,8), dpi=400, facecolor='w', edgecolor='k') 
hx, hy, _ = plt.hist(data, bins=50, density=1,color="blue")


plt.ylim(0.0,1.5)
plt.title('Gaussian mixture example 01')
plt.grid()

plt.xlim(mu_0-4*srd_0,mu_0+4*srd_0)

Upvotes: 0

Views: 92

Answers (2)

Daisy Welham
Daisy Welham

Reputation: 235

It's impossible to know without the rest of the code, but it seems the problem may be in cutting off the data. If your code implicitly (or explicitly) has something along the lines of:

if x < -0.75:
    x = -0.75

Then that would explain the cutoff- all the values less than -0.75 get rounded up to -0.75, hence the spike.

It's also possible that you aren't generating enough data, and that the spike at -0.75 is just a random fluctuation. One consequence of the central limit theorem is that you need around 20 repeats of a thing (but the more, the better) for it to be statistically significant, so you may be able to get rid of that spike just by throwing more data at it. If not, it's almost certainly the cutoff thing.

If it's the cutoff problem, changing the rest of the code to do something equivalent to:-

if x < -0.75:
    {x is not counted}

-:should fix it, at least on the range you're interested in.

Upvotes: 0

Peter K.
Peter K.

Reputation: 8108

Without seeing the values of aa, I'd guess that your data is being clipped on the low end. So, instead of being allowed to go as negative as it wants to go, it's being restricted to some value.

Then, because it wants to stay below that clipping threshold, many samples are at that negative value.

This doesn't appear to have much to do with the Gaussian Mixture Model of your title, though. A Gaussian Mixture Model is a stochastic (random) model that assumes the data is generated by several distinct Gaussian distributions added together. Mixture here means "added".

How to solve it?

This has to start with how the aa data is being acquired. How was it generated? Is the equipment used to capture it set up correctly? Are there some NaN values in the aa data that are converting to a nonsense value? If so, replace the NaN with the average of the non-NaN values on either side of the NaN value.

Upvotes: 2

Related Questions