Jerad
Jerad

Reputation: 603

How to calculate occurrences between evenly-spaced intervals with set start/end points using Pandas?

I'm trying to create a timeline of events over the past day with a series of dates. Each date is considered an occurrence of an event. Occurrences need to be grouped by the hour. I need to include zero values in the timeline.

Sample Data

items = [
    datetime.datetime(2018, 3, 19, 16, 51, 48),
    datetime.datetime(2018, 3, 19, 17, 25, 19),
    datetime.datetime(2018, 3, 20, 6, 33, 35),
    datetime.datetime(2018, 3, 19, 23, 21, 35),
    datetime.datetime(2018, 3, 19, 15, 8, 41),
    datetime.datetime(2018, 3, 19, 21, 44, 16),
    datetime.datetime(2018, 3, 19, 18, 21, 28),
    datetime.datetime(2018, 3, 20, 7, 20, 22),
    datetime.datetime(2018, 3, 20, 11, 15, 43)
]

Right now, I have it working, but this doesn't feel like the correct way to do this. Any advice?

Current solution

import pandas as pd

def _generate_timeseries(items, start_ts, end_ts):
    # add start/end times to the data
    items.insert(0, start_ts)
    items.append(end_ts)

    # value each datetime as one occurrence
    data = [1 for x in range(len(items))]
    timeseries = pd.Series(data, index=items)
    hourly_data = timeseries.resample('H').sum()

    timeline = hourly_data.tolist()
    return [{'mentions': x} for x in timeline[1:-1]]

Results Example

timeline =[
    {'mentions': 4}, {'mentions': 2}, {'mentions': 1}, {'mentions': 0}, {'mentions': 3}, {'mentions': 2}, {'mentions': 2}, {'mentions': 1}, {'mentions': 1}, {'mentions': 0}, {'mentions': 1}, {'mentions': 0}, {'mentions': 14}, {'mentions': 1}, {'mentions': 4}, {'mentions': 2}, {'mentions': 3}, {'mentions': 2}, {'mentions': 1}, {'mentions': 2}, {'mentions': 6}, {'mentions': 2}, {'mentions': 2}
]

Upvotes: 0

Views: 271

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

IIUC, you can use this:

df = pd.DataFrame({'Event':items})
df.groupby(pd.Grouper(key='Event',freq='H'))['Event'].count()

Output:

Event
2018-03-19 15:00:00    1
2018-03-19 16:00:00    1
2018-03-19 17:00:00    1
2018-03-19 18:00:00    1
2018-03-19 19:00:00    0
2018-03-19 20:00:00    0
2018-03-19 21:00:00    1
2018-03-19 22:00:00    0
2018-03-19 23:00:00    1
2018-03-20 00:00:00    0
2018-03-20 01:00:00    0
2018-03-20 02:00:00    0
2018-03-20 03:00:00    0
2018-03-20 04:00:00    0
2018-03-20 05:00:00    0
2018-03-20 06:00:00    1
2018-03-20 07:00:00    1
2018-03-20 08:00:00    0
2018-03-20 09:00:00    0
2018-03-20 10:00:00    0
2018-03-20 11:00:00    1
Freq: H, Name: Event, dtype: int64

Edit to get full days:

df.groupby(pd.Grouper(key='Event',freq='H'))['Event'].count()
  .reindex(pd.date_range(df.Event.dt.floor('D').min(), 
                         df.Event.dt.ceil('D').max(), 
                         freq='H')).fillna(0)

OUtput:

2018-03-19 00:00:00    0.0
2018-03-19 01:00:00    0.0
2018-03-19 02:00:00    0.0
2018-03-19 03:00:00    0.0
2018-03-19 04:00:00    0.0
2018-03-19 05:00:00    0.0
2018-03-19 06:00:00    0.0
2018-03-19 07:00:00    0.0
2018-03-19 08:00:00    0.0
2018-03-19 09:00:00    0.0
2018-03-19 10:00:00    0.0
2018-03-19 11:00:00    0.0
2018-03-19 12:00:00    0.0
2018-03-19 13:00:00    0.0
2018-03-19 14:00:00    0.0
2018-03-19 15:00:00    1.0
2018-03-19 16:00:00    1.0
2018-03-19 17:00:00    1.0
2018-03-19 18:00:00    1.0
2018-03-19 19:00:00    0.0
2018-03-19 20:00:00    0.0
2018-03-19 21:00:00    1.0
2018-03-19 22:00:00    0.0
2018-03-19 23:00:00    1.0
2018-03-20 00:00:00    0.0
2018-03-20 01:00:00    0.0
2018-03-20 02:00:00    0.0
2018-03-20 03:00:00    0.0
2018-03-20 04:00:00    0.0
2018-03-20 05:00:00    0.0
2018-03-20 06:00:00    1.0
2018-03-20 07:00:00    1.0
2018-03-20 08:00:00    0.0
2018-03-20 09:00:00    0.0
2018-03-20 10:00:00    0.0
2018-03-20 11:00:00    1.0
2018-03-20 12:00:00    0.0
2018-03-20 13:00:00    0.0
2018-03-20 14:00:00    0.0
2018-03-20 15:00:00    0.0
2018-03-20 16:00:00    0.0
2018-03-20 17:00:00    0.0
2018-03-20 18:00:00    0.0
2018-03-20 19:00:00    0.0
2018-03-20 20:00:00    0.0
2018-03-20 21:00:00    0.0
2018-03-20 22:00:00    0.0
2018-03-20 23:00:00    0.0
2018-03-21 00:00:00    0.0
Freq: H, Name: Event, dtype: float64

Upvotes: 1

Related Questions