Resample python list with pandas

Question

Fairly new to python and pandas here.

I make a query that's giving me back a timeseries. I'm never sure how many data points I receive from the query (run for a single day), but what I do know is that I need to resample them to contain 24 points (one for each hour in the day).

Printing m3hstream gives

[(1479218009000L, 109), (1479287368000L, 84)]

Then I try to make a dataframe df with

df = pd.DataFrame(data = list(m3hstream), columns=['Timestamp', 'Value'])

and this gives me an output of

          Timestamp  Value
       0  1479218009000    109
       1  1479287368000     84

Following I do this

 daily_summary = pd.DataFrame()
 daily_summary['value'] = df['Value'].resample('H').mean()
 daily_summary = daily_summary.truncate(before=start, after=end)
 print "Now daily summary"
 print daily_summary

But this is giving me a TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

Could anyone please let me know how to resample it so I have 1 point for each hour in the 24 hour period that I'm querying for?

Thanks.

piRSquared · Accepted Answer

First thing you need to do is convert that 'Timestamp' to an actual pd.Timestamp. It looks like those are milliseconds
Then resample with the on parameter set to 'Timestamp'

df = df.assign(
    Timestamp=pd.to_datetime(df.Timestamp, unit='ms')
).resample('H', on='Timestamp').mean().reset_index()

             Timestamp  Value
0  2016-11-15 13:00:00  109.0
1  2016-11-15 14:00:00    NaN
2  2016-11-15 15:00:00    NaN
3  2016-11-15 16:00:00    NaN
4  2016-11-15 17:00:00    NaN
5  2016-11-15 18:00:00    NaN
6  2016-11-15 19:00:00    NaN
7  2016-11-15 20:00:00    NaN
8  2016-11-15 21:00:00    NaN
9  2016-11-15 22:00:00    NaN
10 2016-11-15 23:00:00    NaN
11 2016-11-16 00:00:00    NaN
12 2016-11-16 01:00:00    NaN
13 2016-11-16 02:00:00    NaN
14 2016-11-16 03:00:00    NaN
15 2016-11-16 04:00:00    NaN
16 2016-11-16 05:00:00    NaN
17 2016-11-16 06:00:00    NaN
18 2016-11-16 07:00:00    NaN
19 2016-11-16 08:00:00    NaN
20 2016-11-16 09:00:00   84.0

If you want to fill those NaN values, use ffill, bfill, or interpolate

df.assign(
    Timestamp=pd.to_datetime(df.Timestamp, unit='ms')
).resample('H', on='Timestamp').mean().reset_index().interpolate()

             Timestamp   Value
0  2016-11-15 13:00:00  109.00
1  2016-11-15 14:00:00  107.75
2  2016-11-15 15:00:00  106.50
3  2016-11-15 16:00:00  105.25
4  2016-11-15 17:00:00  104.00
5  2016-11-15 18:00:00  102.75
6  2016-11-15 19:00:00  101.50
7  2016-11-15 20:00:00  100.25
8  2016-11-15 21:00:00   99.00
9  2016-11-15 22:00:00   97.75
10 2016-11-15 23:00:00   96.50
11 2016-11-16 00:00:00   95.25
12 2016-11-16 01:00:00   94.00
13 2016-11-16 02:00:00   92.75
14 2016-11-16 03:00:00   91.50
15 2016-11-16 04:00:00   90.25
16 2016-11-16 05:00:00   89.00
17 2016-11-16 06:00:00   87.75
18 2016-11-16 07:00:00   86.50
19 2016-11-16 08:00:00   85.25
20 2016-11-16 09:00:00   84.00

Resample python list with pandas

Answers (2)

Related Questions