Reputation: 58
I'm using Python 3.6, trying to get the mean of some values in a subset of a row of a pandas dataframe (pandas version 0.23.4). I'm getting the values with .loc[] and then trying to get the mean of them with mean() from the python statistics package, like so:
import statistics as st
rows = ['row1','row2','row3']
somelist = []
for i in rows:
a = df.loc[i,"Q1":"Q7"]
somelist.append(st.mean(a))
I end up getting answers without any decimal places. If I manually write in the answers to items Q1:Q7 into a list, this is the result:
a = st.mean([2,3,4,4,2,6,5])
print(a)
Out: 3.7142857142857144
But if that sequence was what I pulled from the dataframe, I get a mean with no decimal places, like so:
a = st.mean(df.loc[i,"Q1":"Q7"])
Out: 3
Evidently it's because it thinks it's a numpy.int64 instead of a float. This happens even if I convert the slice from the dataframe into a list, like this:
a = st.mean(list(df.loc[i,"Q1":"Q7"]))
Out: 3
Weirdly, it does NOT happen if I use .mean() :
a = df.loc[i,"Q1":"Q7"].mean()
Out: 3.7142857142857144
I double-checked the st.stdev() method and it seems to work fine. What's going on? Why does it want to print out an integer for the mean automatically? Thanks!
Upvotes: 0
Views: 1286
Reputation: 114811
statistics.mean
converts the output to the same type as the inputs. If the input values are all, say, numpy.int64
, the result is converted to numpy.int64
. Here's the source for statistics.mean
in Python 3.6.7:
def mean(data):
"""Return the sample arithmetic mean of data.
>>> mean([1, 2, 3, 4, 4])
2.8
>>> from fractions import Fraction as F
>>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])
Fraction(13, 21)
>>> from decimal import Decimal as D
>>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")])
Decimal('0.5625')
If ``data`` is empty, StatisticsError will be raised.
"""
if iter(data) is data:
data = list(data)
n = len(data)
if n < 1:
raise StatisticsError('mean requires at least one data point')
T, total, count = _sum(data)
assert count == n
return _convert(total/n, T)
Note that total/n
is converted to the input type before being returned.
To avoid this, you could convert the input to floating point before passing it to statistics.mean
.
Upvotes: 1
Reputation: 87
I think you are doing the things in the for part wrong. Try printing the a for each row that you are going trough and the appended mean in the list.
Upvotes: 0