rifle123
rifle123

Reputation: 259

why python statistics.mean() return an int type, not float

I use statistics.mean() to calculate mean from sampled distribution. However, in the following code, the returned value from the following value is rounded integer. If I use numpy.mean() instead, will get the correct float typed results. So what is going on here?

import statistics
from scipy import stats

posterior_sample = stats.beta.rvs(3, 19, size = 1000)
predictive_sample = stats.binom.rvs(100, posterior_sample, size = 1000)
print(statistics.mean(predictive_sample))
print(statistics.mean([(data >= 15).astype(int) for data in predictive_sample]))

Upvotes: 3

Views: 5710

Answers (1)

Andrew Guy
Andrew Guy

Reputation: 9988

statistics.mean does not support the numpy.int64 data type.

From the docs for statistics:

Unless explicitly noted otherwise, these functions support int, float, decimal.Decimal and fractions.Fraction. Behaviour with other types (whether in the numeric tower or not) is currently unsupported. Mixed types are also undefined and implementation-dependent. If your input data consists of mixed types, you may be able to use map() to ensure a consistent result, e.g. map(float, input_data).

To get around this, you can do as suggested, and convert your data to float before passing to statistics.mean().

print(statistics.mean(map(float, predictive_sample)))

Now for the underlying reason behind this behaviour:

At the end of the source code for statistics.mean, there is a call to statistics._convert, which is meant to convert the returned value to an appropriate type (i.e. Fraction if inputs are fractions, float if inputs are int etc).

A single line in _convert is meant to catch other data types, and ensure that the returned value is consistent with the provided data (T is the data type for each input value, value is the calculated mean):

try:
    return T(value)

If your input is numpy.int64, then the _convert function tries to convert the calculated mean to numpy.int64 data type. NumPy happily converts a float to an int (rounded down I think). And hence the mean function returns a mean rounded to the nearest integer, encoded as numpy.int64.

If your input data is numpy.float64, then you won't have this problem.

Upvotes: 4

Related Questions