Harry
Harry

Reputation: 299

pandas rolling mean didn't work for my time data series

I'm writing a moving average function on time series:

def  datedat_moving_mean(datedat,window):

    #window is the average length

    datedatF = pandas.DataFrame(datedat)
    return (datedatF.rolling(window).mean()).values

The code above is copied from Moving Average- Pandas

The I apply this function to this time series:

datedat1 = numpy.array(
[ pandas.date_range(start=datetime.datetime(2015, 1, 30),periods=17),
numpy.random.rand(17)]).T

However, datedat_moving_mean(datedat1,4) just return the original datedat1. It moving averaged nothing! What's wrong?

Upvotes: 0

Views: 4280

Answers (1)

emmet02
emmet02

Reputation: 942

Your construction of the DataFrame has no index (defaults to ints) and has a column of Timestamp and a column of floats.

I imagine that you want to use the Timestamps as an index, but even if not, you will need to for the purpose of using .rolling() on the frame.

I would suggest that your initialisation of the original DataFrame should be more like this

import pandas as pd
import numpy as np

df = pd.DataFrame(data=np.random.rand(17), index=pd.date_range(start=datetime.datetime(2015, 1, 30),periods=17))

If however you don't, and you are happy to have the dataframe un-indexed, you can work around the rolling issue by temporarily setting the index to the Timestamp column

import pandas as pd
import numpy as np
import datetime

datedat1 = np.array([ pd.date_range(start=datetime.datetime(2015, 1, 30),periods=17),np.random.rand(17)]).T
datedatF = pd.DataFrame(datedat1)

# We can temporarily set the index, compute the rolling mean, and then 
# return the values of the entire DataFrame
vals = datedatF.set_index(0).rolling(5).mean().reset_index().values

return vals

I would suggest however that the DataFrame being created with an index will be better (consider what happens in the event that the datetimes are not sorted and you call rolling on the dataframe?)

Upvotes: 1

Related Questions