Pandas: calculating average behaviour of time series data around an event

Question

I have two dataframes with date time information, df_stream is a stream of events and df_events are specific events in time. For example:

The blue is the stream and the red line is the event. The window is 30min either side of the event and the time granularity is 30s.

How do I produce a dataframe / graph of the average behaviour of the stream for all events for a given time window and time granularity?

The stream dataframe looks like:

                     streamEventId
DateTime                          
2020-08-20 10:39:24       44791313
2020-08-20 10:40:30       44791721
2020-08-20 10:40:54       44694121
2020-08-20 10:41:16       44902962
2020-08-20 10:42:04       44622569

The event dataframe looks like:

            DateTime
0 2020-11-17 09:49:00
1 2020-11-17 10:49:00
2 2020-11-17 11:11:00
3 2020-11-17 11:16:00
4 2020-11-17 12:11:00

I've managed to get graphs of each event with their respective windows printed but am struggling to combine the logic. Also I use iterrows which I am not a fan of.

My current approach:

for i in df_events[["DateTime"]].iterrows():
    date_time = i[1].values[0]
    before = date_time - pd.Timedelta(window)
    after = date_time + pd.Timedelta(window)

    df_stream_temp = df_stream.loc[before:after].copy()
    plt.figure(figsize=(20, 2))
    df_stream_mva = (
        df_stream.streamEventId.groupby(pd.Grouper(freq="30s"))
        .count()
        .loc[before:after]
    )

    y_height = df_stream_mva.max()

    ax = df_stream_mva.plot()
    plt.vlines(df_events.DateTime.to_list(), 0, y_height, color="lightcoral")
    ax.set_ylim([0, y_height])

Which gives a nice series of graphs:

And so on...

I would like to calculate and then plot the average of the above with confidence intervals.

philMarius · Accepted Answer

Sat down and worked it out.

Using a similar logic to my iterrows example above, I looped through df_events, created before and after time window limits and used these to filter df_stream grouped by a given time frequency. Then recalculated the index using time delta indexes to get a "distance from event" index. This was then appended to a list that was then combined using pd.concat with an axis of 1. Example below:

window = "30 min"
freq = "30s"

collect_list = []
for i in df_events[["DateTime"]].iterrows():
    date_time = i[1].values[0]
    before = date_time - pd.Timedelta(window)
    after = date_time + pd.Timedelta(window)

    df_stream_window = (
        df_stream.streamEventId.groupby(pd.Grouper(freq=freq))
        .count()
        .loc[before:after]
        .reset_index()
        .rename(columns={"streamEventId": "stream events"})
    )

    df_stream_window = df_stream_window.set_index(
        pd.to_timedelta((df_stream_window.DateTime - pd.to_datetime(date_time)))
        / np.timedelta64("1", "m")
    )

    collect_list.append(df_stream_window)

df_collect = pd.concat(collect_list, axis=1)

This generated the below dataframe:

                    DateTime  stream events            DateTime  \
DateTime                                                          
-30.0    2020-11-17 09:19:00              5 2020-11-17 10:19:00   
-29.5    2020-11-17 09:19:30              5 2020-11-17 10:19:30   
-29.0    2020-11-17 09:20:00              3 2020-11-17 10:20:00   
-28.5    2020-11-17 09:20:30              2 2020-11-17 10:20:30   
-28.0    2020-11-17 09:21:00              3 2020-11-17 10:21:00   

        stream events            DateTime  stream events  \
DateTime                                                     
-30.0                 3 2020-11-17 10:41:00              6   
-29.5                 5 2020-11-17 10:41:30              6   
-29.0                 3 2020-11-17 10:42:00              4   
-28.5                 3 2020-11-17 10:42:30              1   
-28.0                 1 2020-11-17 10:43:00              4   

....

This can then be used in plotting with estimator="mean" in Seaborn:

ax = sns.lineplot(data=df_collect, estimator="mean")
plt.axvline(0, color="lightcoral")
ax.set_xlabel("Time before / after event (min)")
ax.set_ylabel("# Stream Events")

Pandas: calculating average behaviour of time series data around an event

Answers (1)

Related Questions