Pandas issue with sums

Question

Using pandas 0.11, there appears to be a bug (or at least non-intuitive behavior) when setting a sum on a dataframe. Any advice?

p = pandas.DataFrame({ 'x' : [1,2,3], 'y' : [1,2,3] })
sumOfP = p.sum() #Gives a Series of [6,6]. OK.
totals = pandas.DataFrame({ 'someOtherSeries' : [1,2])
totals['sumOfP'] = sumOfP #BAD! This is now [nan, nan]

I'd expect totals['sumOfP'] to be [6,6]. So why is it nan,nan?

DSM · Accepted Answer

It's because they're aligning on the index. Look closer at p.sum():

>>> sumOfP = p.sum()
>>> sumOfP
x    6
y    6
dtype: int64

This is a Series indexed by x and y, and you're trying to cram it into a new column in a DataFrame with indices of 0 and 1. That's great, says the totals frame, but you didn't tell me what should go in the "sumOfP" column at indices 0 and 1, and I'm not going to guess. Compare:

>>> p = pandas.DataFrame({ 0 : [1,2,3], 'y' : [1,2,3] })
>>> totals["sumOfP"] = p.sum()
>>> totals
   someOtherSeries  sumOfP
0                1       6
1                2     NaN

[2 rows x 2 columns]

If you want to ignore the indices, you could just put the values in if you wanted:

>>> totals["sumofP"] = sumOfP.values
>>> totals
   someOtherSeries  sumofP
0                1       6
1                2       6

[2 rows x 2 columns]

or reset the index beforehand:

>>> sumOfP.reset_index(drop=True)
0    6
1    6
dtype: int64
>>> totals["sumOfP"] = sumOfP.reset_index(drop=True)
>>> totals
   someOtherSeries  sumOfP
0                1       6
1                2       6

[2 rows x 2 columns]

Pandas issue with sums

Answers (1)

Related Questions