gtd88
gtd88

Reputation: 1

Does the order of inf and not-inf matter in pandas.groupby.agg(sum) function?

I tried to calculate the sum of a grouped pd.DataFrame (or pd.Series) containing inf. Thereby, I figured out that the location of the inf in the original pd.DataFrame will influence the result beeing nan or inf.

Here is an example: Let us assume, there is a Series df with a two-level MultiIndex:

mid = pd.MultiIndex.from_tuples(tuple([('a',1), ('a',2), ('b',1), ('b',2), ('c',1), ('c',2)])
df = pd.Series(np.array([np.inf, 1, 1, np.inf, np.inf, np.inf]), index=mid)

df
a  1    inf
   2    1.0
b  1    1.0
   2    inf
c  1    inf
   2    inf
dtype: float64

If I calculate the sum of the aggregated Series, I get nan for groups a and c, but inf for group b:

df.groupby(level=[0]).agg(sum)

a    NaN
b    inf
c    NaN
dtype: float64

I would expect inf for all of them, as (np.inf+1)==(1+np.inf) and (np.inf+1)==(np.inf+np.inf) both result in True.

The result is the same for np.nansum.

There have been known bugs with inf and pd.agg(sum):

Nevertheless, they are all either closed or completed and do not address the order of inf in the pd.Series.

Can somebody explain me, why the order of inf matters in this calculation and why the sum of two inf results in nan?

My pandas version is 1.4.4

Thank you in advance!

Upvotes: 0

Views: 105

Answers (0)

Related Questions