relgames
relgames

Reputation: 1376

Why Scipy stdDev returns wrong results?

import scipy

timeseries = [53.0, 28.0, 20.0, 113.0, 68.0, 18.0, 9.0, 72.0, 37.0, 29.0, 16.0, 70.0, 45.0, 3.0, 79.0, 7.0, 17.0, 0.0, 84.0, 19.0,
          0.0, 1.0, 5.0, 16.0, 1485.3333, 650.0, 39.0, 52.0, 82.0, 13.0, 11.0, 14.0, 31.0, 20.0, 399.0, 124.0, 39.0, 0.0, 9.0,
          42.0, 41.0, 98.5, 10.0, 4.0, 19.0, 53.0, 60.0, 789.0, 1471.3333, 876.0, 5.0, 714.0, 136.0, 27.0, 38.0, 29.0, 10.0,
          181.0, 1.0, 14.0, 39.0, 29.0, 2.0, 1502.0, 174.5, 4.0, 305.0, 222.6667, 349.0, 38.0, 15.0, 168.0, 41.0, 28.0, 15.0,
          508.0, 57.0, 26.0, 146.0, 50.5, 20.0, 12.0, 10.0, 15.0, 3.0, 19.0, 2922.0, 5200.5, 2989.0, 0.0, 5.0, 13.0, 2.0, 2.0,
          4.0, 32.0, 66.0, 4.0, 36.0, 1.0, 6.0, 8.0, 88.0, 3.0, 7.0, 250.0, 0.5, 9.0, 0.0, 94.0, 16.0, 3.0, 6.0, 15.0, 4.0, 4.0,
          240.0, 266.6667, 1208.0, 2387.0, 3883.5, 2997.3333, 2667.0, 417.5, 3.0, 26.0, 15.0, 11.0, 4.0, 70.0, 202.0, 2.0, 13.0,
          3.0, 1.0, 6.0, 7.0, 5.0, 140.0, 954.0, 2343.0, 5264.6667, 6051.5, 1181.0, 489.5, 879.0, 1531.0, 2064.3333, 1472.0,
          2029.3333, 3112.0, 2232.6667, 45.0, 716.5, 997.0, 1374.6667, 1993.5, 2549.0, 2690.5, 2640.3333, 2514.5, 1230.0, 475.5,
          893.3333, 1984.5, 2054.3333, 1800.5, 2793.3333, 3630.5, 4305.3333, 5214.0, 5790.6667]

series = scipy.array(timeseries)
stdDev = scipy.std(series, dtype=scipy.float64)

print stdDev

returns 1246.16323355 while Java program from Commons Math returns 1249.801674091763
If I check it with http://easycalculation.com/statistics/standard-deviation.php it also returns 1249.80167

What is wrong with Scipy standard deviation?

Upvotes: 3

Views: 89

Answers (1)

Warren Weckesser
Warren Weckesser

Reputation: 114946

Read the Notes section of the docstring for numpy.std (which is the same as scipy.std). By default, std divides the sum of the squared deviations by n. To get the value that matches the value returned by your other tools, used ddof=1, to make it divide by n - 1:

In [2]: a = np.array(timeseries)

In [3]: std(a)
Out[3]: 1246.1632335502143

In [4]: std(a, ddof=1)
Out[4]: 1249.8016740917631

Upvotes: 6

Related Questions