Reputation: 603
I'm trying to create a timeline of events over the past day with a series of dates. Each date is considered an occurrence of an event. Occurrences need to be grouped by the hour. I need to include zero values in the timeline.
Sample Data
items = [
datetime.datetime(2018, 3, 19, 16, 51, 48),
datetime.datetime(2018, 3, 19, 17, 25, 19),
datetime.datetime(2018, 3, 20, 6, 33, 35),
datetime.datetime(2018, 3, 19, 23, 21, 35),
datetime.datetime(2018, 3, 19, 15, 8, 41),
datetime.datetime(2018, 3, 19, 21, 44, 16),
datetime.datetime(2018, 3, 19, 18, 21, 28),
datetime.datetime(2018, 3, 20, 7, 20, 22),
datetime.datetime(2018, 3, 20, 11, 15, 43)
]
Right now, I have it working, but this doesn't feel like the correct way to do this. Any advice?
Current solution
import pandas as pd
def _generate_timeseries(items, start_ts, end_ts):
# add start/end times to the data
items.insert(0, start_ts)
items.append(end_ts)
# value each datetime as one occurrence
data = [1 for x in range(len(items))]
timeseries = pd.Series(data, index=items)
hourly_data = timeseries.resample('H').sum()
timeline = hourly_data.tolist()
return [{'mentions': x} for x in timeline[1:-1]]
Results Example
timeline =[
{'mentions': 4}, {'mentions': 2}, {'mentions': 1}, {'mentions': 0}, {'mentions': 3}, {'mentions': 2}, {'mentions': 2}, {'mentions': 1}, {'mentions': 1}, {'mentions': 0}, {'mentions': 1}, {'mentions': 0}, {'mentions': 14}, {'mentions': 1}, {'mentions': 4}, {'mentions': 2}, {'mentions': 3}, {'mentions': 2}, {'mentions': 1}, {'mentions': 2}, {'mentions': 6}, {'mentions': 2}, {'mentions': 2}
]
Upvotes: 0
Views: 271
Reputation: 153460
IIUC, you can use this:
df = pd.DataFrame({'Event':items})
df.groupby(pd.Grouper(key='Event',freq='H'))['Event'].count()
Output:
Event
2018-03-19 15:00:00 1
2018-03-19 16:00:00 1
2018-03-19 17:00:00 1
2018-03-19 18:00:00 1
2018-03-19 19:00:00 0
2018-03-19 20:00:00 0
2018-03-19 21:00:00 1
2018-03-19 22:00:00 0
2018-03-19 23:00:00 1
2018-03-20 00:00:00 0
2018-03-20 01:00:00 0
2018-03-20 02:00:00 0
2018-03-20 03:00:00 0
2018-03-20 04:00:00 0
2018-03-20 05:00:00 0
2018-03-20 06:00:00 1
2018-03-20 07:00:00 1
2018-03-20 08:00:00 0
2018-03-20 09:00:00 0
2018-03-20 10:00:00 0
2018-03-20 11:00:00 1
Freq: H, Name: Event, dtype: int64
df.groupby(pd.Grouper(key='Event',freq='H'))['Event'].count()
.reindex(pd.date_range(df.Event.dt.floor('D').min(),
df.Event.dt.ceil('D').max(),
freq='H')).fillna(0)
OUtput:
2018-03-19 00:00:00 0.0
2018-03-19 01:00:00 0.0
2018-03-19 02:00:00 0.0
2018-03-19 03:00:00 0.0
2018-03-19 04:00:00 0.0
2018-03-19 05:00:00 0.0
2018-03-19 06:00:00 0.0
2018-03-19 07:00:00 0.0
2018-03-19 08:00:00 0.0
2018-03-19 09:00:00 0.0
2018-03-19 10:00:00 0.0
2018-03-19 11:00:00 0.0
2018-03-19 12:00:00 0.0
2018-03-19 13:00:00 0.0
2018-03-19 14:00:00 0.0
2018-03-19 15:00:00 1.0
2018-03-19 16:00:00 1.0
2018-03-19 17:00:00 1.0
2018-03-19 18:00:00 1.0
2018-03-19 19:00:00 0.0
2018-03-19 20:00:00 0.0
2018-03-19 21:00:00 1.0
2018-03-19 22:00:00 0.0
2018-03-19 23:00:00 1.0
2018-03-20 00:00:00 0.0
2018-03-20 01:00:00 0.0
2018-03-20 02:00:00 0.0
2018-03-20 03:00:00 0.0
2018-03-20 04:00:00 0.0
2018-03-20 05:00:00 0.0
2018-03-20 06:00:00 1.0
2018-03-20 07:00:00 1.0
2018-03-20 08:00:00 0.0
2018-03-20 09:00:00 0.0
2018-03-20 10:00:00 0.0
2018-03-20 11:00:00 1.0
2018-03-20 12:00:00 0.0
2018-03-20 13:00:00 0.0
2018-03-20 14:00:00 0.0
2018-03-20 15:00:00 0.0
2018-03-20 16:00:00 0.0
2018-03-20 17:00:00 0.0
2018-03-20 18:00:00 0.0
2018-03-20 19:00:00 0.0
2018-03-20 20:00:00 0.0
2018-03-20 21:00:00 0.0
2018-03-20 22:00:00 0.0
2018-03-20 23:00:00 0.0
2018-03-21 00:00:00 0.0
Freq: H, Name: Event, dtype: float64
Upvotes: 1