More efficient way to calculate standard deviation of a large list in Python

Question

Hello I am trying to calculate a bunch of standard deviations of a list about 20,000 values long. Here is an example of my code:

from statistics import stdev

def main():
    a = [x for x in range(0,20000)]
    b = []

    for x in range(2, len(a) + 2):
        b.append(stdev(a[:x]))

    print(b)

main()

This method is extremely slow and I am trying to figure out a way to make it more efficient. Any help is appreciated. Thank you.

[Done] exited with code=null in 820.376 seconds

DSM · Accepted Answer

It looks like you want an expanding standard deviation, for which I'd use the pandas library and the pandas.Series.expanding method:

In [156]: main()[:5]
Out[156]: 
[0.7071067811865476,
 1.0,
 1.2909944487358056,
 1.5811388300841898,
 1.8708286933869707]

In [157]: pd.Series(range(20000)).expanding().std()[:5]
Out[157]: 
0         NaN
1    0.707107
2    1.000000
3    1.290994
4    1.581139
dtype: float64

where you could easily slice away the first element and convert to a list if you wanted:

In [158]: pd.Series(range(20000)).expanding().std()[1:6].tolist()
Out[158]: 
[0.7071067811865476,
 1.0,
 1.2909944487358056,
 1.5811388300841898,
 1.8708286933869707]

although Series are a much more useful datatype for working with time series than lists, and definitely more performant:

In [159]: %timeit pd.Series(range(20000)).expanding().std()
1.07 ms ± 30.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

More efficient way to calculate standard deviation of a large list in Python

Answers (2)

Related Questions