wafflecup
wafflecup

Reputation: 13

Standardizing timeseries in Pandas using interpolation

First timer, be nice!

Related to question with slightly different variable types: Pandas timeseries resampling and interpolating together

I have collected sensor data and the time stamps I have are very irregular, and I want them to be at consistent 1 minute intervals. The only way to conserve the distribution and have data at each minute is to interpolate.

The data set is a million rows long, but this is a header preview (the sensor records in ISO timestamps):

   Raw                             DataValue
0 2016-05-01T00:00:59.3+10:00    354.9819946
1 2016-05-01T00:02:59.4+10:00    354.9819946
2 2016-05-01T00:03:59.4+10:00    350.6199951
3 2016-05-01T00:13:00.1+10:00    351.4880066
4 2016-05-01T00:22:00.5+10:00    352.9719849
5 2016-05-01T00:31:01.1+10:00    352.0710144

My current code is below, I am using pandas and numpy:

data = 
pd.read_csv('C:/Users/user/Documents/Data/cleaneddata1.csv', 
parse_dates=True)

data['Raw'].index = pd.to_datetime(data['Raw'].index)

d = data.set_index('Raw')
t = d.index
r = pd.date_range(t.min().date(), periods=(len(data)), freq='T')

d.reindex(t.union(r)).interpolate('index').ix[r]

It doesn't work, it returns

r = pd.date_range(t.min().date(), periods=(len(data)), freq='T') AttributeError: 'str' object has no attribute 'date'

This has been driving me crazy, I am unsure whether the 'str' it refers to is associated with the ISO timestamps.

Upvotes: 1

Views: 300

Answers (1)

cs95
cs95

Reputation: 402273

You're looking for:

data['Raw'] = pd.to_datetime(data['Raw'])

Raw is a column, and data['Raw'] returns a series, you want to work with that (not its index). Once you're done with that, I'd recommend interpolation with df.resample:

data = data.set_index('Raw').resample('1min').mean()

If you still want to use interpolate, you could use .agg(interpolate) instead.


Since you want to retain the original Raw column, you could instead use:

data = data.assign(RawDt=pd.to_datetime(data.Raw))\
       .groupby(pd.Grouper(key='RawDt', freq='1min'))\
       .agg({'DataValue' : 'mean', 'Raw' : 'first'}).reset_index(drop=True)

Upvotes: 2

Related Questions