a_guest
a_guest

Reputation: 36249

Pandas rolling returns NaN when infinity values are involved

When using rolling on a series that contains inf values the result contains NaN even if the operation is well defined, like min or max. For example:

import numpy as np
import pandas as pd

s = pd.Series([1, 2, 3, np.inf, 5, 6])
print(s.rolling(window=3).min())

This gives:

0    NaN
1    NaN
2    1.0
3    NaN
4    NaN
5    NaN
dtype: float64

while I expected

0    NaN
1    NaN
2    1.0
3    2.0
4    3.0
5    5.0

Computing the minimum of the series directly works as expected:

s.min()  # 1.0

What is the reason for additional NaN values being introduced?


Python 3.8.1, pandas 1.0.2

Upvotes: 9

Views: 636

Answers (1)

ALollz
ALollz

Reputation: 59549

np.inf is explicitly converted to np.NaN in pandas/core/window/rolling.py

# Convert inf to nan for C funcs
inf = np.isinf(values)
if inf.any():
    values = np.where(inf, np.nan, values)

How to represent inf or -inf in Cython with numpy? gives information on why they had to do this.


You'd find the exact same behavior if you had NaN instead of np.inf. It can be difficult to get your output because min_counts will throw away those intermediate groups because they lack sufficient observations. One clean "hack" is to replace inf with the biggest value you can, which should be rather safe taking 'min'.

import numpy as np
s.replace(np.inf, np.finfo('float64').max).rolling(3).min()

#0    NaN
#1    NaN
#2    1.0
#3    2.0
#4    3.0
#5    5.0
#dtype: float64

Upvotes: 6

Related Questions