Reputation: 1
I tried to calculate the sum of a grouped pd.DataFrame
(or pd.Series
) containing inf
. Thereby, I figured out that the location of the inf
in the original pd.DataFrame will influence the result beeing nan
or inf
.
Here is an example:
Let us assume, there is a Series df
with a two-level MultiIndex:
mid = pd.MultiIndex.from_tuples(tuple([('a',1), ('a',2), ('b',1), ('b',2), ('c',1), ('c',2)])
df = pd.Series(np.array([np.inf, 1, 1, np.inf, np.inf, np.inf]), index=mid)
df
a 1 inf
2 1.0
b 1 1.0
2 inf
c 1 inf
2 inf
dtype: float64
If I calculate the sum of the aggregated Series, I get nan
for groups a
and c
, but inf
for group b
:
df.groupby(level=[0]).agg(sum)
a NaN
b inf
c NaN
dtype: float64
I would expect inf
for all of them, as (np.inf+1)==(1+np.inf)
and (np.inf+1)==(np.inf+np.inf)
both result in True
.
The result is the same for np.nansum
.
There have been known bugs with inf and pd.agg(sum):
Nevertheless, they are all either closed or completed and do not address the order of inf in the pd.Series
.
Can somebody explain me, why the order of inf
matters in this calculation and why the sum of two inf
results in nan
?
My pandas
version is 1.4.4
Thank you in advance!
Upvotes: 0
Views: 105