Pandas groupby and resample

Question

I have the following data:

id,device,event,timestamp
0,a,something,1990-05-01 00:09:05
1,a,something_else,1990-05-01 00:09:08
2,a,we_dont_care,1990-05-01 00:09:23
3,a,whatever,1990-05-01 00:11:05
4,a,whatsoever,1990-05-01 00:12:32
5,a,ok,1990-05-01 00:12:46
6,b,still_dont_care,1990-05-01 00:13:46
7,b,thats_annoying,1990-05-01 00:14:46
8,b,please_stop,1990-05-01 00:14:55
9,b,really_stop,1990-05-01 00:22:46
10,a,ok,1990-05-01 00:23:43

I want to group the data in the following way:

device,timestamp,count
a,1990-05-01 00:09,3
a,1990-05-01 00:11,1
a,1990-05-01 00:12,2
a,1990-05-01 00:23,1
b,1990-05-01 00:13,1
b,1990-05-01 00:14,2
b,1990-05-01 00:22,1

Where timestamp is the timestamp trimmed to the hour (so no minutes nor seconds) and count is simply the sum of the events that happened on a device in a certain hour (we don't care about the type of event, we just need to sum them). Note that I don't really want to group by the hour:

1990-05-01 00:09:05 and 1990-05-01 00:09:15 should be grouped together
1990-05-01 00:09:05 and 1990-06-01 00:09:15 should not be grouped together

This makes the purpose of groping, but it does not resample, so it will leave empty values if no events happened for a device. Instead, I would like to have a row with count=0:

df_count = df.groupby(["cameraId", df["timestamp"].dt.hour])["id"].count()

crayxt · Accepted Answer

We can use

>>> df.groupby(df.timestamp.dt.floor('h')).count()
            id  device  event  timestamp
timestamp
1990-05-01  11      11     11         11

With the separate device column

>>> df.groupby(by=["device", df.timestamp.dt.floor('h')]).count()["event"].reset_index()
  device  timestamp  event
0      a 1990-05-01      7
1      b 1990-05-01      4

Pandas groupby and resample

Answers (1)

Related Questions