Ehrendil
Ehrendil

Reputation: 253

Resample python list with pandas

Fairly new to python and pandas here.

I make a query that's giving me back a timeseries. I'm never sure how many data points I receive from the query (run for a single day), but what I do know is that I need to resample them to contain 24 points (one for each hour in the day).

Printing m3hstream gives

[(1479218009000L, 109), (1479287368000L, 84)]

Then I try to make a dataframe df with

df = pd.DataFrame(data = list(m3hstream), columns=['Timestamp', 'Value'])

and this gives me an output of

          Timestamp  Value
       0  1479218009000    109
       1  1479287368000     84

Following I do this

 daily_summary = pd.DataFrame()
 daily_summary['value'] = df['Value'].resample('H').mean()
 daily_summary = daily_summary.truncate(before=start, after=end)
 print "Now daily summary"
 print daily_summary

But this is giving me a TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

Could anyone please let me know how to resample it so I have 1 point for each hour in the 24 hour period that I'm querying for?

Thanks.

Upvotes: 3

Views: 4349

Answers (2)

piRSquared
piRSquared

Reputation: 294536

  • First thing you need to do is convert that 'Timestamp' to an actual pd.Timestamp. It looks like those are milliseconds
  • Then resample with the on parameter set to 'Timestamp'

df = df.assign(
    Timestamp=pd.to_datetime(df.Timestamp, unit='ms')
).resample('H', on='Timestamp').mean().reset_index()

             Timestamp  Value
0  2016-11-15 13:00:00  109.0
1  2016-11-15 14:00:00    NaN
2  2016-11-15 15:00:00    NaN
3  2016-11-15 16:00:00    NaN
4  2016-11-15 17:00:00    NaN
5  2016-11-15 18:00:00    NaN
6  2016-11-15 19:00:00    NaN
7  2016-11-15 20:00:00    NaN
8  2016-11-15 21:00:00    NaN
9  2016-11-15 22:00:00    NaN
10 2016-11-15 23:00:00    NaN
11 2016-11-16 00:00:00    NaN
12 2016-11-16 01:00:00    NaN
13 2016-11-16 02:00:00    NaN
14 2016-11-16 03:00:00    NaN
15 2016-11-16 04:00:00    NaN
16 2016-11-16 05:00:00    NaN
17 2016-11-16 06:00:00    NaN
18 2016-11-16 07:00:00    NaN
19 2016-11-16 08:00:00    NaN
20 2016-11-16 09:00:00   84.0

If you want to fill those NaN values, use ffill, bfill, or interpolate

df.assign(
    Timestamp=pd.to_datetime(df.Timestamp, unit='ms')
).resample('H', on='Timestamp').mean().reset_index().interpolate()

             Timestamp   Value
0  2016-11-15 13:00:00  109.00
1  2016-11-15 14:00:00  107.75
2  2016-11-15 15:00:00  106.50
3  2016-11-15 16:00:00  105.25
4  2016-11-15 17:00:00  104.00
5  2016-11-15 18:00:00  102.75
6  2016-11-15 19:00:00  101.50
7  2016-11-15 20:00:00  100.25
8  2016-11-15 21:00:00   99.00
9  2016-11-15 22:00:00   97.75
10 2016-11-15 23:00:00   96.50
11 2016-11-16 00:00:00   95.25
12 2016-11-16 01:00:00   94.00
13 2016-11-16 02:00:00   92.75
14 2016-11-16 03:00:00   91.50
15 2016-11-16 04:00:00   90.25
16 2016-11-16 05:00:00   89.00
17 2016-11-16 06:00:00   87.75
18 2016-11-16 07:00:00   86.50
19 2016-11-16 08:00:00   85.25
20 2016-11-16 09:00:00   84.00

Upvotes: 3

Scott Boston
Scott Boston

Reputation: 153550

Let's try:

daily_summary = daily_summary.set_index('Timestamp')

daily_summary.index = pd.to_datetime(daily_summary.index, unit='ms')

For once an hour:

daily_summary.resample('H').mean()

or for once a day:

daily_summary.resample('D').mean()

Upvotes: 2

Related Questions