Chang
Chang

Reputation: 11

python normal distribution

I have a list of numbers, with sample mean and SD for these numbers. Right now I am trying to find out the numbers out of mean+-SD,mean +-2SD and mean +-3SD. For example, in the part of mean+-SD, i made the code like this:

ND1 = [np.mean(l)+np.std(l,ddof=1)]    
ND2 = [np.mean(l)-np.std(l,ddof=1)]

m=sorted(l)

print(m)

ND68 = []

if ND2 > m and m< ND1:

    ND68.append(m<ND2 and m>ND1)
    print (ND68)

Here is my question: 1. Could number be calculated by the list and arrange. If so, which part I am doing wrong. Or there is some package I can use to solve this.

Upvotes: 1

Views: 286

Answers (2)

StarFox
StarFox

Reputation: 637

You are on the right track there. You know the mean and standard deviation of your list l, though I'm going to call it something a little less ambiguous, say, samplePopulation.

Because you want to do this for several intervals of standard deviation, I recommend crafting a small function. You can call it multiple times without too much extra work. Also, I'm going to use a list comprehension, which is just a for loop in one line.

import numpy as np

def filter_by_n_std_devs(samplePopulation, numStdDevs):
    # you mostly got this part right, no need to put them in lists though
    mean = np.mean(samplePopulation) # no brackets needed here
    std = np.std(samplePopulation) # or here
    band = numStdDevs * std 

    # this is the list comprehension
    filteredPop = [x for x in samplePopulation if x < mean - band or x > mean + band]
    return filteredPop

# now call your function with however many std devs you want
filteredPopulation = filter_by_n_std_devs(samplePopulation, 1)
print(filteredPopulation)

Here's a translation of the list comprehension (based on your use of append it looks like you may not know what these are, otherwise feel free to ignore).

# remember that you provide the variable samplePopulation
# the above list comprehension
filteredPop = [x for x in samplePopulation if x < mean - band or x > mean + band]

# is equivalent to this:
filteredPop = []
for num in samplePopulation:
    if x < mean - band or x > mean + band:
        filteredPop.append(num)

So to recap:

  • You don't need to make a list object out of your mean and std calculations
  • The function call let's you plug in your samplePopulation and any number of standard deviations you want without having to go in and manually change the value
  • List comprehensions are one line for loops, more or less, and you can even do the filtering you want right inside it!

Upvotes: 1

James
James

Reputation: 36598

This might help. We will use numpy to grab the values you are looking for. In my example, I create a normally distributed array and then use boolean slicing to return the elements that are outside of +/- 1, 2, or 3 standard deviations.

import numpy as np

# create a random normally distributed integer array
my_array = np.random.normal(loc=30, scale=10, size=100).astype(int)

# find the mean and standard dev
my_mean = my_array.mean()
my_std = my_array.std()

# find numbers outside of 1, 2, and 3 standard dev
# the portion inside the square brackets returns an
# array of True and False values.  Slicing my_array
# with the boolean array return only the values that
# are True
out_std_1 = my_array[np.abs(my_array-my_mean) > my_std]
out_std_2 = my_array[np.abs(my_array-my_mean) > 2*my_std]
out_std_3 = my_array[np.abs(my_array-my_mean) > 3*my_std]

Upvotes: 2

Related Questions