Reputation: 259
I use statistics.mean() to calculate mean from sampled distribution. However, in the following code, the returned value from the following value is rounded integer. If I use numpy.mean() instead, will get the correct float typed results. So what is going on here?
import statistics
from scipy import stats
posterior_sample = stats.beta.rvs(3, 19, size = 1000)
predictive_sample = stats.binom.rvs(100, posterior_sample, size = 1000)
print(statistics.mean(predictive_sample))
print(statistics.mean([(data >= 15).astype(int) for data in predictive_sample]))
Upvotes: 3
Views: 5710
Reputation: 9988
statistics.mean
does not support the numpy.int64
data type.
From the docs for statistics
:
Unless explicitly noted otherwise, these functions support int, float, decimal.Decimal and fractions.Fraction. Behaviour with other types (whether in the numeric tower or not) is currently unsupported. Mixed types are also undefined and implementation-dependent. If your input data consists of mixed types, you may be able to use map() to ensure a consistent result, e.g. map(float, input_data).
To get around this, you can do as suggested, and convert your data to float
before passing to statistics.mean()
.
print(statistics.mean(map(float, predictive_sample)))
Now for the underlying reason behind this behaviour:
At the end of the source code for statistics.mean
, there is a call to statistics._convert
, which is meant to convert the returned value to an appropriate type (i.e. Fraction if inputs are fractions, float
if inputs are int
etc).
A single line in _convert
is meant to catch other data types, and ensure that the returned value is consistent with the provided data (T
is the data type for each input value, value
is the calculated mean):
try:
return T(value)
If your input is numpy.int64
, then the _convert
function tries to convert the calculated mean to numpy.int64
data type. NumPy happily converts a float to an int (rounded down I think). And hence the mean
function returns a mean rounded to the nearest integer, encoded as numpy.int64
.
If your input data is numpy.float64
, then you won't have this problem.
Upvotes: 4