user2317421
user2317421

Reputation:

Pandas group by irregular date ranges

I have a DataFrame with multiple datetimes:

from faker import Faker
from datetime import datetime
import pandas as pd

fake = Faker()

n = 100
start_date = datetime(2020, 1, 1, 0, 6, 0)
end_date = datetime(2020, 2, 1, 0, 17, 0)

df = pd.DataFrame({"datetime": [fake.date_time_between(start_date=start_date, end_date=end_date) for _ in range(n)], "count": 1}).sort_values(by='datetime')
              datetime  count
61 2020-01-01 12:56:39      1
0  2020-01-01 15:10:35      1
22 2020-01-02 09:37:50      1
41 2020-01-02 15:46:58      1
44 2020-01-03 06:49:39      1
..                 ...    ...
89 2020-01-29 09:51:02      1
98 2020-01-29 22:43:13      1
39 2020-01-30 01:40:48      1
79 2020-01-31 14:07:28      1
43 2020-01-31 20:24:43      1

I need to group by datetime, where the cutoffs are are daily at 6am and 5pm (so, the first group would be in [2019-12-31T17:00:00, 2020-01-01T06:00:00), the second group would be in [2020-01-01T06:00:00, 2020-01-01T17:00:00), and so forth.

I've looked into grouping by datetime using pd.Grouper, but I haven't been able to figure out how to use freq for intervals of different lengths (in this case, 13h and 11h, instead of two intervals of 12h).

Upvotes: 1

Views: 304

Answers (1)

Henry Yik
Henry Yik

Reputation: 22493

IIUC you can first resample to 1H, create a group number by checking the hour, and then finally groupby:

df = df.resample("1H", on="datetime").sum()

df["group"] = df.index.hour.isin([6,17]).cumsum()

print (df.reset_index().groupby("group").agg({"datetime":"first", "count":"sum"}))

                 datetime  count
group                           
0     2020-01-01 03:00:00      1
1     2020-01-01 06:00:00      3
2     2020-01-01 17:00:00      0
3     2020-01-02 06:00:00      1
4     2020-01-02 17:00:00      2
5     2020-01-03 06:00:00      1
6     2020-01-03 17:00:00      4
7     2020-01-04 06:00:00      2
8     2020-01-04 17:00:00      5
9     2020-01-05 06:00:00      2
10    2020-01-05 17:00:00      2
11    2020-01-06 06:00:00      0
12    2020-01-06 17:00:00      1
13    2020-01-07 06:00:00      2
14    2020-01-07 17:00:00      1
15    2020-01-08 06:00:00      0
16    2020-01-08 17:00:00      2
17    2020-01-09 06:00:00      2
18    2020-01-09 17:00:00      1
19    2020-01-10 06:00:00      2
20    2020-01-10 17:00:00      3
21    2020-01-11 06:00:00      0
22    2020-01-11 17:00:00      0
23    2020-01-12 06:00:00      2
24    2020-01-12 17:00:00      1
25    2020-01-13 06:00:00      1
26    2020-01-13 17:00:00      0
27    2020-01-14 06:00:00      3
28    2020-01-14 17:00:00      3
29    2020-01-15 06:00:00      1
30    2020-01-15 17:00:00      3
31    2020-01-16 06:00:00      0
32    2020-01-16 17:00:00      0
33    2020-01-17 06:00:00      1
34    2020-01-17 17:00:00      2
35    2020-01-18 06:00:00      2
36    2020-01-18 17:00:00      3
37    2020-01-19 06:00:00      2
38    2020-01-19 17:00:00      1
39    2020-01-20 06:00:00      3
40    2020-01-20 17:00:00      2
41    2020-01-21 06:00:00      2
42    2020-01-21 17:00:00      2
43    2020-01-22 06:00:00      3
44    2020-01-22 17:00:00      2
45    2020-01-23 06:00:00      1
46    2020-01-23 17:00:00      2
47    2020-01-24 06:00:00      1
48    2020-01-24 17:00:00      0
49    2020-01-25 06:00:00      0
50    2020-01-25 17:00:00      3
51    2020-01-26 06:00:00      2
52    2020-01-26 17:00:00      0
53    2020-01-27 06:00:00      3
54    2020-01-27 17:00:00      1
55    2020-01-28 06:00:00      0
56    2020-01-28 17:00:00      1
57    2020-01-29 06:00:00      1
58    2020-01-29 17:00:00      0
59    2020-01-30 06:00:00      1
60    2020-01-30 17:00:00      3
61    2020-01-31 06:00:00      0
62    2020-01-31 17:00:00      1
63    2020-02-01 06:00:00      4

Upvotes: 2

Related Questions