Reputation: 36249
When using rolling
on a series that contains inf
values the result contains NaN
even if the operation is well defined, like min
or max
. For example:
import numpy as np
import pandas as pd
s = pd.Series([1, 2, 3, np.inf, 5, 6])
print(s.rolling(window=3).min())
This gives:
0 NaN
1 NaN
2 1.0
3 NaN
4 NaN
5 NaN
dtype: float64
while I expected
0 NaN
1 NaN
2 1.0
3 2.0
4 3.0
5 5.0
Computing the minimum of the series directly works as expected:
s.min() # 1.0
What is the reason for additional NaN
values being introduced?
Python 3.8.1, pandas 1.0.2
Upvotes: 9
Views: 636
Reputation: 59549
np.inf
is explicitly converted to np.NaN
in pandas/core/window/rolling.py
# Convert inf to nan for C funcs
inf = np.isinf(values)
if inf.any():
values = np.where(inf, np.nan, values)
How to represent inf or -inf in Cython with numpy? gives information on why they had to do this.
You'd find the exact same behavior if you had NaN
instead of np.inf
. It can be difficult to get your output because min_counts
will throw away those intermediate groups because they lack sufficient observations. One clean "hack" is to replace inf
with the biggest value you can, which should be rather safe taking 'min'
.
import numpy as np
s.replace(np.inf, np.finfo('float64').max).rolling(3).min()
#0 NaN
#1 NaN
#2 1.0
#3 2.0
#4 3.0
#5 5.0
#dtype: float64
Upvotes: 6