pandas rolling mean returns different results on different machines

Question

I'm running the same code on the same input on my local machine(Python 3.9.5, pandas 0.25.3) and on a remote machine (Python 3.7.4, pandas 0.25.1) and I'm receiving different results.

The input is:

jsn_str = '{"user_1":{"77":4514.0,"44":7867.54,"67":10406.54,"12":7151.0,"56":1921.0,"36":9471.0,"47":2021.0,"25":3211.0,"26":2021.0,"15":4651.0,"71":8805.0,"62":352.0}}'

I use the code:

pd.DataFrame(json.loads(jsn_str), index=(str(x) for x in range(93))).fillna(method='ffill').fillna(0).sum(axis=1).rolling(window=1).mean().fillna(0).unique()

on my local machine I receive the expected result which is:

array([    0.  ,  7151.  ,  4651.  ,  3211.  ,  2021.  ,  9471.  ,
        7867.54,  1921.  ,   352.  , 10406.54,  8805.  ,  4514.  ])

but on the remote machine, the result is:

array([    0.  ,  7151.  ,  4651.  ,  3211.  ,  **2021.**  ,  9471.  ,
        **7867.54**,  **7867.54**,  **2021.**  ,  1921.  ,   352.  , 10406.54,
        8805.  ,  4514.  ])

I get 1 more appearance of 2021. and 7867.54 because for some reason when I apply rolling(window=1).mean() I get random results of the floating-point: 2021.000000000001, 2021.0, 7867.540000000001, 7867.540000000002, and when I take the unique values, all the values above are considered.

This phenomenon happens in many more examples I have and I could not understand why and when it is suddenly randomly happening. (and I cannot remove the rolling(window=1).mean() from my code)

did anyone encounter this behavior? any suggestions?

Paritosh Singh · Accepted Answer

I made a few python environments and was able to reproduce this behaviour using two python 3.7 environments with different pandas versions, so it seems to be directly or indirectly related to the pandas version 0.25.1.

I modified and used the following code snippet.

import pandas as pd
import numpy as np
import json
import sys
print(sys.version)
print(pd.__version__)
jsn_str = '{"user_1":{"77":4514.0,"44":7867.54,"67":10406.54,"12":7151.0,"56":1921.0,"36":9471.0,"47":2021.0,"25":3211.0,"26":2021.0,"15":4651.0,"71":8805.0,"62":352.0}}'
df = pd.DataFrame(json.loads(jsn_str), index=(str(x) for x in range(93))).fillna(method='ffill').fillna(0).sum(axis=1)
print(len(df.rolling(window=1).mean().fillna(0).unique()))
print(len(df.rolling(window=1).apply(np.mean, raw=False).fillna(0).unique()))
print(len(df.rolling(window=1).apply(np.mean, raw=True).fillna(0).unique()))
print(len(df.rolling(window=1).apply(pd.Series.mean, raw=False).fillna(0).unique()))

Environment 1 Output

3.7.11 (default) [MSC v.1916 64 bit (AMD64)]
1.3.0
12
12
12
12

Environment 2 Output

3.7.11 (default) [MSC v.1916 64 bit (AMD64)]
0.25.1
14 # this is our culprit
12
12
12

So, things you can do:

Either change your pandas version and use something that's more updated, or,

if you must use pandas 0.25.1 you can perhaps use one of the apply variants shown here instead of using ..rolling..mean which seems to have this strange behaviour.

print(len(df.rolling(window=1).apply(pd.Series.mean, raw=False).fillna(0).unique()))

pandas rolling mean returns different results on different machines

Answers (1)

Related Questions