candid
candid

Reputation: 179

Pandas rolling mean don't change numbers to NaN in DataFrame

I'm working with a pandas DataFrame which looks like this:

(**N.B - the offset is set as the index of the DataFrame)

offset         X         Y         Z
  0   -0.140137   -1.924316   -0.426758
 10   -2.789123   -1.111212   -0.416016
 20   -0.133789   -1.923828   -4.408691
 30   -0.101112   -1.457891   -0.425781
 40   -0.126465   -1.926758   -0.414062
 50   -0.137207   -1.916992   -0.404297
 60   -0.130371   -3.784591   -0.987654
 70   -0.125000   -1.918457   -0.403809
 80   -0.123456   -1.917480   -0.413574
 90   -0.126465   -1.926758   -0.333554

I have applied the rolling mean with window size = 5, to the data frame using the following code. I need to keep this window size = 5 and I need values for the whole dataframe for all of the offset values (no NaNs).

df = df.rolling(center=False, window=5).mean()

Which gives me:

offset         X         Y         Z
 0.0       NaN       NaN       NaN
10.0       NaN       NaN       NaN
20.0       NaN       NaN       NaN
30.0       NaN       NaN       NaN
40.0 -0.658125 -1.668801 -1.218262
50.0 -0.657539 -1.667336 -1.213769
60.0 -0.125789 -2.202012 -1.328097
70.0 -0.124031 -2.200938 -0.527121
80.0 -0.128500 -2.292856 -0.524679
90.0 -0.128500 -2.292856 -0.508578

I would like the DataFrame to be able to keep the first values that are NaN unchanged and have the the rest of the values as the result of the rolling mean. Is there a simple way that I would be able to do this? Thanks

i.e.

offset         X         Y         Z
 0.0  -0.140137  -1.924316  -0.426758
10.0  -2.789123  -1.111212  -0.416016
20.0  -0.133789  -1.923828  -4.408691
30.0  -0.101112  -1.457891  -0.425781
40.0  -0.658125  -1.668801  -1.218262
50.0  -0.657539  -1.667336  -1.213769
60.0  -0.125789  -2.202012  -1.328097
70.0  -0.124031  -2.200938  -0.527121
80.0  -0.128500  -2.292856  -0.524679
90.0  -0.128500  -2.292856  -0.508578

Upvotes: 3

Views: 2338

Answers (2)

thorbjorn444
thorbjorn444

Reputation: 236

Assuming you don't have other rows with all NaN's, you can identify which rows have all NaN's in your rolling_df, and replace them with the corresponding rows from the original. Example:

df=pd.DataFrame(np.random.rand(13,5))
df_rolling=df.rolling(center=False,window=5).mean()
#identify which rows are all NaN
idx = df_rolling.index[df_rolling.isnull().all(1)]
#replace those rows with the original data
df_rolling.loc[idx,:]=df.loc[idx,:]

Upvotes: 0

user2285236
user2285236

Reputation:

You can fill with the original df:

df.rolling(center=False, window=5).mean().fillna(df)
Out: 
               X         Y         Z
offset                              
0      -0.140137 -1.924316 -0.426758
10     -2.789123 -1.111212 -0.416016
20     -0.133789 -1.923828 -4.408691
30     -0.101112 -1.457891 -0.425781
40     -0.658125 -1.668801 -1.218262
50     -0.657539 -1.667336 -1.213769
60     -0.125789 -2.202012 -1.328097
70     -0.124031 -2.200938 -0.527121
80     -0.128500 -2.292856 -0.524679
90     -0.128500 -2.292856 -0.508578

There is also an argument, min_periods that you can use. If you pass min_periods=1 then it will take the first value as it is, second value as the mean of the first two etc. It might make more sense in some cases.

df.rolling(center=False, window=5, min_periods=1).mean()
Out: 
               X         Y         Z
offset                              
0      -0.140137 -1.924316 -0.426758
10     -1.464630 -1.517764 -0.421387
20     -1.021016 -1.653119 -1.750488
30     -0.791040 -1.604312 -1.419311
40     -0.658125 -1.668801 -1.218262
50     -0.657539 -1.667336 -1.213769
60     -0.125789 -2.202012 -1.328097
70     -0.124031 -2.200938 -0.527121
80     -0.128500 -2.292856 -0.524679
90     -0.128500 -2.292856 -0.508578

Upvotes: 6

Related Questions