Reputation: 51

why is python statistics.mean() function acting differently when passed a numpy.ndarray or a list?

why does statistics.mean act so weird? when passed a numpy.ndarray outputs the average

statistics.mean(np.array([1,4,9])) 
4

when passed a list outputs the actual mean

statistics.mean([1,4,9]) 
4.666666666666667

I'm using python 3.7

Upvotes: 5

Answers (3)

abc

Reputation: 11939

This is due to the definition of the function statistics.mean. The function uses a subroutine _convert.

In the case of a list, it will be called as _convert(Fraction(14, 3), int).

By being int a subclass of int the code executed will be
```
if issubclass(T, int) and value.denominator != 1:
    T = float
try:
    return T(value)
```
In the numpy array case it will be called as _convert(Fraction(14, 3), np.int64) and the code executed will just be
```
try:
  return T(value) 
```
since np.int64 is not a subclass of int.

Upvotes: 1

Thomas Sablik

Reputation: 16448

No, it doesn't return the median in the first case. It returns the mean value as numpy.int64 because the input is an array of non-primitive integers.

If you pass non-primitive objects to statistics.mean the result will be converted to the input data type. In your case statistics.mean does something equivalent to:

numpy.int64(sum(np.array([1,4,9]))/len(np.array([1,4,9])))

I'm using Python 3.8. Here is the code for mean:

def mean(data):
    """Return the sample arithmetic mean of data.

    >>> mean([1, 2, 3, 4, 4])
    2.8

    >>> from fractions import Fraction as F
    >>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])
    Fraction(13, 21)

    >>> from decimal import Decimal as D
    >>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])
    Decimal('0.5625')

    If ``data`` is empty, StatisticsError will be raised.
    """
    if iter(data) is data:
        data = list(data)
    n = len(data)
    if n < 1:
        raise StatisticsError('mean requires at least one data point')
    T, total, count = _sum(data)
    assert count == n
    return _convert(total/n, T)

Here is the code for _sum:

def _sum(data, start=0):
    """_sum(data [, start]) -> (type, sum, count)

    Return a high-precision sum of the given numeric data as a fraction,
    together with the type to be converted to and the count of items.

    If optional argument ``start`` is given, it is added to the total.
    If ``data`` is empty, ``start`` (defaulting to 0) is returned.


    Examples
    --------

    >>> _sum([3, 2.25, 4.5, -0.5, 1.0], 0.75)
    (<class 'float'>, Fraction(11, 1), 5)

    Some sources of round-off error will be avoided:

    # Built-in sum returns zero.
    >>> _sum([1e50, 1, -1e50] * 1000)
    (<class 'float'>, Fraction(1000, 1), 3000)

    Fractions and Decimals are also supported:
    >>> from fractions import Fraction as F
    >>> _sum([F(2, 3), F(7, 5), F(1, 4), F(5, 6)])
    (<class 'fractions.Fraction'>, Fraction(63, 20), 4)

    >>> from decimal import Decimal as D
    >>> data = [D("0.1375"), D("0.2108"), D("0.3061"), D("0.0419")]
    >>> _sum(data)
    (<class 'decimal.Decimal'>, Fraction(6963, 10000), 4)

    Mixed types are currently treated as an error, except that int is
    allowed.
    """
    count = 0
    n, d = _exact_ratio(start)
    partials = {d: n}
    partials_get = partials.get
    T = _coerce(int, type(start))
    for typ, values in groupby(data, type):
        T = _coerce(T, typ)  # or raise TypeError
        for n,d in map(_exact_ratio, values):
           count += 1
            partials[d] = partials_get(d, 0) + n
    if None in partials:
        # The sum will be a NAN or INF. We can ignore all the finite
        # partials, and just look at this special one.
        total = partials[None]
        assert not _isfinite(total)
    else:
        # Sum all the partial sums using builtin sum.
        # FIXME is this faster if we sum them in order of the denominator?
        total = sum(Fraction(n, d) for d, n in sorted(partials.items()))
    return (T, total, count)

Here is the code for _convert:


def _convert(value, T):
    """Convert value to given numeric type T."""
    if type(value) is T:
        # This covers the cases where T is Fraction, or where value is
        # a NAN or INF (Decimal or float).
        return value
    if issubclass(T, int) and value.denominator != 1:
        T = float
    try:
        # FIXME: what do we do if this overflows?
        return T(value)
    except TypeError:
        if issubclass(T, Decimal):
            return T(value.numerator)/T(value.denominator)
        else:
            raise

Upvotes: 2

wasif

Reputation: 15498

No, it is not median. statistics.mean() expect the list, you get the rounded value because you pass a numpy array of integers. To calculate mean of numpy array use np.mean(np.array([1,4,9]))

Upvotes: 1

why is python statistics.mean() function acting differently when passed a numpy.ndarray or a list?

Answers (3)

Related Questions