Reputation: 43
I'm trying to calculate the standard deviation from a bunch of numbers in a document. Here's what I got so far:
with open("\\Users\\xxx\\python_courses\\1DV501\\assign3\\file_10000integers_B.txt", "r") as f:
total2 = 0
number_of_ints2 = 0
deviation = 0.0
variance = 0.0
for line in f:
for num in line.split(':'):
total2 += int(num)
number_of_ints2 += 1
average = total2/number_of_ints2
for line in f:
for num in line.split(":"):
devation += [(int(num) - average) **2
But I'm completely stuck. I dont know how to do it. Math is not my strong suite so this this is turning out to be quite difficult. Also the document is mixed with negative and positive numbers if that makes any difference.
Upvotes: 2
Views: 2836
Reputation: 19260
The problem is that you are iterating over the file twice, and you didn't reset the reader to the beginning of the file before the second loop. You can use f.seek(0)
to do this.
total2 = 0
number_of_ints2 = 0
deviation = 0.0
variance = 0.0
with open("numbers.txt", "r") as f:
for line in f:
for num in line.split(':'):
total2 += int(num)
number_of_ints2 += 1
average = total2 / number_of_ints2
f.seek(0) # Move back to the beginning of the file.
for line in f:
for num in line.split(":"):
deviation += (int(num) - average) ** 2
Upvotes: 1
Reputation: 117866
You can use a few available libraries, for example if I had data I got from somewhere
>>> import random
>>> data = [random.randint(1,100) for _ in range(100)] # assume from your txt file
I could use statistics.stdev
>>> import statistics
>>> statistics.stdev(data)
28.453646514989956
or numpy.std
>>> import numpy as np
>>> np.std(data)
28.311020822287563
or scipy.stats.tstd
>>> import scipy.stats
>>> scipy.stats.tstd(data)
28.453646514989956
or if you want to roll your own
def stddev(data):
mean = sum(data) / len(data)
return math.sqrt((1/len(data)) * sum((i-mean)**2 for i in data))
>>> stddev(data)
28.311020822287563
Note that the slight difference in computed value will depend on if you want "sample" standard deviation or "population" standard deviation, see here
Upvotes: 2
Reputation: 1572
you may use the function, here is the official documentation :
Set your numbers in a list, then apply your function :
from statistics import stdev
mylist = [1,2,5,10,100]
std = stdev(mylist)
Upvotes: 1