Reputation: 3412
I have the following data:
id,device,event,timestamp
0,a,something,1990-05-01 00:09:05
1,a,something_else,1990-05-01 00:09:08
2,a,we_dont_care,1990-05-01 00:09:23
3,a,whatever,1990-05-01 00:11:05
4,a,whatsoever,1990-05-01 00:12:32
5,a,ok,1990-05-01 00:12:46
6,b,still_dont_care,1990-05-01 00:13:46
7,b,thats_annoying,1990-05-01 00:14:46
8,b,please_stop,1990-05-01 00:14:55
9,b,really_stop,1990-05-01 00:22:46
10,a,ok,1990-05-01 00:23:43
I want to group the data in the following way:
device,timestamp,count
a,1990-05-01 00:09,3
a,1990-05-01 00:11,1
a,1990-05-01 00:12,2
a,1990-05-01 00:23,1
b,1990-05-01 00:13,1
b,1990-05-01 00:14,2
b,1990-05-01 00:22,1
Where timestamp
is the timestamp trimmed to the hour (so no minutes nor seconds) and count
is simply the sum of the events that happened on a device
in a certain hour (we don't care about the type of event, we just need to sum them). Note that I don't really want to group by the hour:
1990-05-01 00:09:05
and 1990-05-01 00:09:15
should be grouped together1990-05-01 00:09:05
and 1990-06-01 00:09:15
should not be grouped togetherThis makes the purpose of groping, but it does not resample, so it will leave empty values if no events happened for a device. Instead, I would like to have a row with count=0
:
df_count = df.groupby(["cameraId", df["timestamp"].dt.hour])["id"].count()
Upvotes: 0
Views: 899
Reputation: 2405
We can use
>>> df.groupby(df.timestamp.dt.floor('h')).count()
id device event timestamp
timestamp
1990-05-01 11 11 11 11
With the separate device
column
>>> df.groupby(by=["device", df.timestamp.dt.floor('h')]).count()["event"].reset_index()
device timestamp event
0 a 1990-05-01 7
1 b 1990-05-01 4
Upvotes: 1