Reputation: 572
I have a simple pandas dataframe with around 10000 to 20000 entries for each day. This dataframe contains a point and a datetime (datetime64). For example, it looks like this:
point timestamp_local
0 A 2018-09-29 00:00:20
1 A 2018-09-29 00:04:34
2 A 2018-09-29 00:06:59
3 B 2018-09-29 00:11:09
4 B 2018-09-29 01:19:28
... ... ...
24282 B 2018-09-29 21:40:26
24283 C 2018-09-29 21:40:31
24284 C 2018-09-29 21:45:17
24285 A 2018-09-29 22:20:29
24286 B 2018-09-29 22:28:08
What I now what to get is a dataframe which groups the dataframe above by point and a interval I want to specify and also counts the amount of entries for each point of the interval. Also the interval should be for example a 5 min. interval, a 10 min. interval or also 1 interval on a daily, monthly or yearly base.
This is what I got so far to segment the interval:
df['10min_period'] = df.apply(lambda x: "period_%d"%(int(x[1].minute/10) + 1), axis=1)
This returns:
point timestamp_local 10min_period
0 A 2018-09-29 00:00:20 period_1
1 B 2018-09-29 00:04:34 period_1
2 B 2018-09-29 00:06:59 period_1
3 C 2018-09-29 00:11:09 period_2
4 C 2018-09-29 01:19:28 period_2
And this counts the periods:
df = df.groupby([df['point'], df['10min_period']]).agg(['count'])
This returns the following dataframe:
timestamp_local
point 10min_period count
A period_1 2092
period_2 2437
period_3 2181
period_4 2525
period_5 2325
period_6 2317
B period_1 1814
period_2 1719
period_3 1732
period_4 1575
period_5 1789
period_6 1781
... ... ...
But this is not exactly what I want. The reason for this is that the period row entries are wrong. My code has segmented the periods in 10 minute intervals independent from the year, month, date and hour. That is exactly what I don't want!
I want to have a dateframe which segmented by an interval I have specified, e.g. 5 min. , 10 min., 1 day, 1 year and so on but considers the year, month, day, hour and minute! (Take a look on how the periods are named!)
I give you an example of what I want:
point timestamp_local 10min_period
0 A 2018-09-29 00:00:20 period_2018-09-29_00:00:00
1 B 2018-09-29 00:04:34 period_2018-09-29_00:00:00
2 B 2018-09-29 00:06:59 period_2018-09-29_00:00:00
3 C 2018-09-29 00:11:09 period_2018-09-29_00:10:00
4 C 2018-09-29 00:19:28 period_2018-09-29_00:10:00
5 A 2018-09-29 00:00:20 period_2018-09-29_00:00:00
6 B 2018-09-30 01:04:34 period_2018-09-30_01:00:00
7 B 2018-09-30 00:06:59 period_2018-09-30_00:00:00
8 C 2018-10-29 02:15:09 period_2018-10-29_02:15:00
9 C 2019-09-29 01:19:28 period_2019-09-29_01:10:00
Its very imported to name the period that way so I know to which day and interval the entry contains. How can I do this? And for exmaple if it would have been a 5 minute interval the period should be named like period_2018-09-29_00:00:00
, period_2018-09-29_00:05:00
and period_2018-09-29_00:25:00
and so on and so on.
Thank you very much!
Upvotes: 0
Views: 5755
Reputation: 150735
Are you looking for something like this, for minute intervals:
df.groupby(['point',df.timestamp_local.dt.floor('5Min')]).size()
and this, for month/year
df.groupby(['point', df.timestamp_local.dt.to_period('M')]).size()
Upvotes: 3