Reputation: 20119
I have the following schema:
H T datetime
002599f58 A 2019-09-23
002599f58 A 2019-09-23
002599f58 A 2019-09-23
002599f58 A 2019-09-26
002599f58 C 2019-09-26
005344fe8 C 2019-09-30
005344fe8 D 2019-10-02
005344fe8 D 2019-10-05
What I need to do is count the number of distinct days that H is in T. I've tried:
grouper = Grouper(key="datetime", freq="1D")
df = cdr.groupby(["H", "T", grouper])["datetime"].nunique().reset_index(name="numdays")
and others of the type, like size()
. I now I have 14 days, but when I sort the values, they give me numbers greater than 14, so I know those are wrong, but I cannot tell why.
The expected output is a dataframe
H T numdays
002599f58 A 2 (23 & 26)
002599f58 C 1 (26)
005344fe8 C 1 (30)
005344fe8 D 2 (02 & 05)
which tells me, for each different H and T, the number of distinct days that H is in T, where numdays
cannot be larger than 14 (since that's all days I have). I don't need the data in parens, just there to show where the values come from.
Upvotes: 2
Views: 404
Reputation: 862511
I believe you need to remove Grouper
and for dates add Series.dt.date
:
cdr['datetime'] = pd.to_datetime(cdr['datetime'])
cdr['dates'] = cdr['datetime'].dt.date
df = cdr.groupby(["H", "T"])["dates"].nunique().reset_index(name="numdays")
print (df)
H T numdays
0 002599f58 A 2
1 002599f58 C 1
2 005344fe8 C 1
3 005344fe8 D 2
You can test solutions, old and new with convert output to lists:
print (cdr)
H T datetime
0 002599f58 A 2019-09-23 12:45:10
1 002599f58 A 2019-09-23 02:45:10
2 002599f58 A 2019-09-23 12:45:50
3 002599f58 A 2019-09-26 12:25:10
4 002599f58 C 2019-09-26 10:45:10
5 005344fe8 C 2019-09-30 12:25:10
6 005344fe8 D 2019-10-02 12:45:00
7 005344fe8 D 2019-10-05
cdr['dates'] = cdr['datetime'].dt.date
print (cdr.groupby(["H", "T"])["dates"].apply(list))
H T
002599f58 A [2019-09-23, 2019-09-23, 2019-09-23, 2019-09-26]
C [2019-09-26]
005344fe8 C [2019-09-30]
D [2019-10-02, 2019-10-05]
Name: dates, dtype: object
grouper = pd.Grouper(key="datetime", freq="1D")
print (cdr.groupby(["H", "T", grouper])["datetime"].apply(list))
H T datetime
002599f58 A 2019-09-23 [2019-09-23 12:45:10, 2019-09-23 02:45:10, 201...
2019-09-26 [2019-09-26 12:25:10]
C 2019-09-26 [2019-09-26 10:45:10]
005344fe8 C 2019-09-30 [2019-09-30 12:25:10]
D 2019-10-02 [2019-10-02 12:45:00]
2019-10-05 [2019-10-05 00:00:00]
Name: datetime, dtype: object
Upvotes: 3