Ross Middleton
Ross Middleton

Reputation: 237

Pandas Tick Data Averaging By Hour and Plotting For Each Week Of History

I have been following the answer here:

Pandas: how to plot yearly data on top of each other

Which takes a time series and plots the last data point for each day on a new plot. Each line on the plot represents a week's worth of data (so for example 5 data points per week):

enter image description here

I used the following code to do this:

#Chart by last price
daily = ts.groupby(lambda x: x.isocalendar()[1:]).agg(lambda s: s[-1])
daily.index = pd.MultiIndex.from_tuples(daily.index, names=['W', 'D'])
dofw = "Mon Tue Wed Thu Fri Sat Sun".split()
grid = daily.unstack('D').rename(columns=lambda x: dofw[x-1])
grid[-5:].T.plot()

What I would like to do is instead of aggregating by the last data point in a day I would like to aggregateby hour (so averaging the data for each hour) and chart the hourly data for each week. So the chart will look similar to the one in the linked image only it will have 24 data points per day per line and not just one data point per day per line

Is there any way that I can paste the Pandas DataFrame into this post? When I click copy paste it pastes as a list

EDIT:

Final code taking into account incomplete data on for the latest week for charting purposes:

# First we read the DataFrame and resample it to get a mean on every hour
df = pd.read_csv(r"MYFILE.csv", header=None,
                 parse_dates=[0], index_col=0).resample('H', how='mean').dropna()
# Then we add a week field so we can filter it by the week
df['week']= df.index.map(lambda x: x.isocalendar()[1])
start_range = list(set(df['week']))[-3]
end_range = list(set(df['week']))[-1]
# Create week labels
weekdays = 'Mon Tue Wed Thu Fri Sat Sun'.split()

# Create the figure
fig, ax = plt.subplots()

# For every week we want to plot
for week in range(start_range,end_range+1):
    # Select out the week
    dfw = df[df['week'] == week].copy()
    # Here we align all the weeks to span over the same time period so they
    # can be shown on the graph one over the other, and not one next to
    # the other.
    dfw['timestamp'] = dfw.index.values - (week * np.timedelta64(1, 'W'))
    dfw = dfw.set_index(['timestamp'])
    # Then we plot our data
    ax.plot(dfw.index, dfw[1], label='week %s' % week)
    # Now to set the x labels. First we resample the timestamp to have
    # a date frequency, and set it to be the xtick values
    if week == end_range:
        resampled = resampled.index + pd.DateOffset(weeks=1)
    else:        
        resampled = dfw.resample('D')
   # newresampled = resampled.index + pd.DateOffset(weeks=1)
    ax.set_xticks(resampled.index.values)
    # But change the xtick labels to be the weekdays.
    ax.set_xticklabels(weekdays)
# Plot the legend
plt.legend()

Upvotes: 4

Views: 3454

Answers (2)

Viktor Kerkez
Viktor Kerkez

Reputation: 46636

The solution is explained in the code.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# First we read the DataFrame and resample it to get a mean on every hour
df = pd.read_csv('trayport.csv', header=None,
                 parse_dates=[0], index_col=0).resample('H', how='mean').dropna()
# Then we add a week field so we can filter it by the week
df['week']= df.index.map(lambda x: x.isocalendar()[1])

# Create week labels
weekdays = 'Mon Tue Wed Thu Fri Sat Sun'.split()

# Create the figure
fig, ax = plt.subplots()

# For every week we want to plot
for week in range(1, 4):
    # Select out the week
    dfw = df[df['week'] == week].copy()
    # Here we align all the weeks to span over the same time period so they
    # can be shown on the graph one over the other, and not one next to
    # the other.
    dfw['timestamp'] = dfw.index.values - (week * np.timedelta64(1, 'W'))
    dfw = dfw.set_index(['timestamp'])
    # Then we plot our data
    ax.plot(dfw.index, dfw[1], label='week %s' % week)
    # Now to set the x labels. First we resample the timestamp to have
    # a date frequency, and set it to be the xtick values
    resampled = dfw.resample('D')
    ax.set_xticks(resampled.index.values)
    # But change the xtick labels to be the weekdays.
    ax.set_xticklabels(weekdays)
# Plot the legend
plt.legend()

The result looks like:

enter image description here

Upvotes: 4

Andy Hayden
Andy Hayden

Reputation: 375805

You can use the resample (DataFrame or Series) method:

df.resample('H')

by default it uses the how='mean' (ie this will average results by hour).

Upvotes: 1

Related Questions