Fredrik
Fredrik

Reputation: 43

Calculating the standard deviation from numbers in python

I'm trying to calculate the standard deviation from a bunch of numbers in a document. Here's what I got so far:

with open("\\Users\\xxx\\python_courses\\1DV501\\assign3\\file_10000integers_B.txt", "r") as f:
total2 = 0
number_of_ints2 = 0
deviation = 0.0
variance = 0.0
for line in f:
    for num in line.split(':'):
        total2 += int(num)
        number_of_ints2 += 1
average = total2/number_of_ints2
for line in f:
    for num in line.split(":"):
        devation += [(int(num) - average) **2

But I'm completely stuck. I dont know how to do it. Math is not my strong suite so this this is turning out to be quite difficult. Also the document is mixed with negative and positive numbers if that makes any difference.

Upvotes: 2

Views: 2836

Answers (3)

jkr
jkr

Reputation: 19260

The problem is that you are iterating over the file twice, and you didn't reset the reader to the beginning of the file before the second loop. You can use f.seek(0) to do this.

total2 = 0
number_of_ints2 = 0
deviation = 0.0
variance = 0.0

with open("numbers.txt", "r") as f:
    for line in f:
        for num in line.split(':'):
            total2 += int(num)
            number_of_ints2 += 1
    average = total2 / number_of_ints2
    f.seek(0)  # Move back to the beginning of the file.
    for line in f:
        for num in line.split(":"):
            deviation += (int(num) - average) ** 2

Upvotes: 1

Cory Kramer
Cory Kramer

Reputation: 117866

You can use a few available libraries, for example if I had data I got from somewhere

>>> import random
>>> data = [random.randint(1,100) for _ in range(100)]  # assume from your txt file

I could use statistics.stdev

>>> import statistics
>>> statistics.stdev(data)
28.453646514989956

or numpy.std

>>> import numpy as np
>>> np.std(data)
28.311020822287563

or scipy.stats.tstd

>>> import scipy.stats
>>> scipy.stats.tstd(data)
28.453646514989956

or if you want to roll your own

def stddev(data):
    mean = sum(data) / len(data)
    return math.sqrt((1/len(data)) * sum((i-mean)**2 for i in data))

>>> stddev(data)
28.311020822287563

Note that the slight difference in computed value will depend on if you want "sample" standard deviation or "population" standard deviation, see here

Upvotes: 2

Catalina Chircu
Catalina Chircu

Reputation: 1572

you may use the function, here is the official documentation :

Set your numbers in a list, then apply your function :

from statistics import stdev
mylist = [1,2,5,10,100]
std = stdev(mylist)

Upvotes: 1

Related Questions