Reputation: 1562
Using pandas 0.11, there appears to be a bug (or at least non-intuitive behavior) when setting a sum on a dataframe. Any advice?
p = pandas.DataFrame({ 'x' : [1,2,3], 'y' : [1,2,3] })
sumOfP = p.sum() #Gives a Series of [6,6]. OK.
totals = pandas.DataFrame({ 'someOtherSeries' : [1,2])
totals['sumOfP'] = sumOfP #BAD! This is now [nan, nan]
I'd expect totals['sumOfP'] to be [6,6]. So why is it nan,nan?
Upvotes: 0
Views: 55
Reputation: 352959
It's because they're aligning on the index. Look closer at p.sum()
:
>>> sumOfP = p.sum()
>>> sumOfP
x 6
y 6
dtype: int64
This is a Series
indexed by x
and y
, and you're trying to cram it into a new column in a DataFrame
with indices of 0 and 1. That's great, says the totals
frame, but you didn't tell me what should go in the "sumOfP" column at indices 0 and 1, and I'm not going to guess. Compare:
>>> p = pandas.DataFrame({ 0 : [1,2,3], 'y' : [1,2,3] })
>>> totals["sumOfP"] = p.sum()
>>> totals
someOtherSeries sumOfP
0 1 6
1 2 NaN
[2 rows x 2 columns]
If you want to ignore the indices, you could just put the values in if you wanted:
>>> totals["sumofP"] = sumOfP.values
>>> totals
someOtherSeries sumofP
0 1 6
1 2 6
[2 rows x 2 columns]
or reset the index beforehand:
>>> sumOfP.reset_index(drop=True)
0 6
1 6
dtype: int64
>>> totals["sumOfP"] = sumOfP.reset_index(drop=True)
>>> totals
someOtherSeries sumOfP
0 1 6
1 2 6
[2 rows x 2 columns]
Upvotes: 2