Jan
Jan

Reputation: 572

Pandas group by time interval (5min, 10min, 1day, 1year) and count amount of entries

I have a simple pandas dataframe with around 10000 to 20000 entries for each day. This dataframe contains a point and a datetime (datetime64). For example, it looks like this:

        point   timestamp_local
0       A       2018-09-29 00:00:20
1       A       2018-09-29 00:04:34
2       A       2018-09-29 00:06:59
3       B       2018-09-29 00:11:09
4       B       2018-09-29 01:19:28
...     ...     ...
24282   B       2018-09-29 21:40:26
24283   C       2018-09-29 21:40:31
24284   C       2018-09-29 21:45:17
24285   A       2018-09-29 22:20:29
24286   B       2018-09-29 22:28:08

What I now what to get is a dataframe which groups the dataframe above by point and a interval I want to specify and also counts the amount of entries for each point of the interval. Also the interval should be for example a 5 min. interval, a 10 min. interval or also 1 interval on a daily, monthly or yearly base.

This is what I got so far to segment the interval:

df['10min_period'] = df.apply(lambda x: "period_%d"%(int(x[1].minute/10) + 1), axis=1)

This returns:

    point   timestamp_local         10min_period
0   A       2018-09-29 00:00:20     period_1
1   B       2018-09-29 00:04:34     period_1
2   B       2018-09-29 00:06:59     period_1
3   C       2018-09-29 00:11:09     period_2
4   C       2018-09-29 01:19:28     period_2

And this counts the periods:

df = df.groupby([df['point'], df['10min_period']]).agg(['count'])

This returns the following dataframe:

                           timestamp_local
point   10min_period       count
A       period_1           2092
        period_2           2437
        period_3           2181
        period_4           2525
        period_5           2325
        period_6           2317
B       period_1           1814
        period_2           1719
        period_3           1732
        period_4           1575
        period_5           1789
        period_6           1781
...     ...                ...

But this is not exactly what I want. The reason for this is that the period row entries are wrong. My code has segmented the periods in 10 minute intervals independent from the year, month, date and hour. That is exactly what I don't want!

I want to have a dateframe which segmented by an interval I have specified, e.g. 5 min. , 10 min., 1 day, 1 year and so on but considers the year, month, day, hour and minute! (Take a look on how the periods are named!)

I give you an example of what I want:
        point   timestamp_local         10min_period
    0   A       2018-09-29 00:00:20     period_2018-09-29_00:00:00
    1   B       2018-09-29 00:04:34     period_2018-09-29_00:00:00
    2   B       2018-09-29 00:06:59     period_2018-09-29_00:00:00
    3   C       2018-09-29 00:11:09     period_2018-09-29_00:10:00
    4   C       2018-09-29 00:19:28     period_2018-09-29_00:10:00
    5   A       2018-09-29 00:00:20     period_2018-09-29_00:00:00
    6   B       2018-09-30 01:04:34     period_2018-09-30_01:00:00
    7   B       2018-09-30 00:06:59     period_2018-09-30_00:00:00
    8   C       2018-10-29 02:15:09     period_2018-10-29_02:15:00
    9   C       2019-09-29 01:19:28     period_2019-09-29_01:10:00

Its very imported to name the period that way so I know to which day and interval the entry contains. How can I do this? And for exmaple if it would have been a 5 minute interval the period should be named like period_2018-09-29_00:00:00, period_2018-09-29_00:05:00 and period_2018-09-29_00:25:00 and so on and so on.

Thank you very much!

Upvotes: 0

Views: 5755

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

Are you looking for something like this, for minute intervals:

df.groupby(['point',df.timestamp_local.dt.floor('5Min')]).size()

and this, for month/year

df.groupby(['point', df.timestamp_local.dt.to_period('M')]).size()

Upvotes: 3

Related Questions