Reputation: 2263
I want to find mean and standard deviation of 1st, 2nd,... digits of several (Z) lists. For example, I have
A_rank=[0.8,0.4,1.2,3.7,2.6,5.8]
B_rank=[0.1,2.8,3.7,2.6,5,3.4]
C_Rank=[1.2,3.4,0.5,0.1,2.5,6.1]
# etc (up to Z_rank )...
Now I want to take the mean and std of *_Rank[0]
, the mean and std of *_Rank[1]
, etc.
(ie: mean and std of the 1st digit from all the (A..Z)_rank lists;
the mean and std of the 2nd digit from all the (A..Z)_rank lists;
the mean and std of the 3rd digit...; etc).
Upvotes: 154
Views: 438308
Reputation: 1014
Using python, here are few methods:
import statistics as st
n = int(input())
data = list(map(int, input().split()))
stdev = st.pstdev(data)
variance = st.pvariance(data)
devia = math.sqrt(variance)
mean = sum(data)/n
variance = sum([((x - mean) ** 2) for x in X]) / n
stddev = variance ** 0.5
print("{0:0.1f}".format(stddev))
variance
calculates variance of sample population pvariance
calculates variance of entire population stdev
and pstdev
Upvotes: 15
Reputation: 176810
Here's some pure-Python code you can use to calculate the mean and standard deviation.
All code below is based on the statistics
module in Python 3.4+.
def mean(data):
"""Return the sample arithmetic mean of data."""
n = len(data)
if n < 1:
raise ValueError('mean requires at least one data point')
return sum(data)/n # in Python 2 use sum(data)/float(n)
def _ss(data):
"""Return sum of square deviations of sequence data."""
c = mean(data)
ss = sum((x-c)**2 for x in data)
return ss
def stddev(data, ddof=0):
"""Calculates the population standard deviation
by default; specify ddof=1 to compute the sample
standard deviation."""
n = len(data)
if n < 2:
raise ValueError('variance requires at least two data points')
ss = _ss(data)
pvar = ss/(n-ddof)
return pvar**0.5
Note: for improved accuracy when summing floats, the statistics
module uses a custom function _sum
rather than the built-in sum
which I've used in its place.
Now we have for example:
>>> mean([1, 2, 3])
2.0
>>> stddev([1, 2, 3]) # population standard deviation
0.816496580927726
>>> stddev([1, 2, 3], ddof=1) # sample standard deviation
0.1
Upvotes: 55
Reputation: 14448
Since Python 3.4 / PEP450 there is a statistics module
in the standard library, which has a method stdev
for calculating the standard deviation of iterables like yours:
>>> A_rank = [0.8, 0.4, 1.2, 3.7, 2.6, 5.8]
>>> import statistics
>>> statistics.stdev(A_rank)
2.0634114147853952
Upvotes: 206
Reputation: 111
pure python code:
from math import sqrt
def stddev(lst):
mean = float(sum(lst)) / len(lst)
return sqrt(float(reduce(lambda x, y: x + y, map(lambda x: (x - mean) ** 2, lst))) / len(lst))
Upvotes: 5
Reputation: 1365
The other answers cover how to do std dev in python sufficiently, but no one explains how to do the bizarre traversal you've described.
I'm going to assume A-Z is the entire population. If not see Ome's answer on how to inference from a sample.
So to get the standard deviation/mean of the first digit of every list you would need something like this:
#standard deviation
numpy.std([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])
#mean
numpy.mean([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])
To shorten the code and generalize this to any nth digit use the following function I generated for you:
def getAllNthRanks(n):
return [A_rank[n], B_rank[n], C_rank[n], D_rank[n], E_rank[n], F_rank[n], G_rank[n], H_rank[n], I_rank[n], J_rank[n], K_rank[n], L_rank[n], M_rank[n], N_rank[n], O_rank[n], P_rank[n], Q_rank[n], R_rank[n], S_rank[n], T_rank[n], U_rank[n], V_rank[n], W_rank[n], X_rank[n], Y_rank[n], Z_rank[n]]
Now you can simply get the stdd and mean of all the nth places from A-Z like this:
#standard deviation
numpy.std(getAllNthRanks(n))
#mean
numpy.mean(getAllNthRanks(n))
Upvotes: 3
Reputation: 331
In Python 2.7.1, you may calculate standard deviation using numpy.std()
for:
numpy.std()
with no additional arguments besides to your data list.numpy.std(< your-list >, ddof=1)
The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.
It calculates sample std rather than population std.
Upvotes: 22
Reputation: 2020
In python 2.7 you can use NumPy's numpy.std()
gives the population standard deviation.
In Python 3.4 statistics.stdev()
returns the sample standard deviation. The pstdv()
function is the same as numpy.std()
.
Upvotes: 13
Reputation: 500357
I would put A_Rank
et al into a 2D NumPy array, and then use numpy.mean()
and numpy.std()
to compute the means and the standard deviations:
In [17]: import numpy
In [18]: arr = numpy.array([A_rank, B_rank, C_rank])
In [20]: numpy.mean(arr, axis=0)
Out[20]:
array([ 0.7 , 2.2 , 1.8 , 2.13333333, 3.36666667,
5.1 ])
In [21]: numpy.std(arr, axis=0)
Out[21]:
array([ 0.45460606, 1.29614814, 1.37355985, 1.50628314, 1.15566239,
1.2083046 ])
Upvotes: 120