Reputation: 8628
I have the following dataframe:
df =
Id Datetime
1 2017-03-02 18:06:20
1 2017-03-02 18:05:10
1 2017-04-01 18:04:09
1 2017-03-02 19:06:50
1 2017-04-01 19:07:22
2 2017-03-03 18:09:15
2 2017-05-03 19:07:05
2 2017-05-03 20:19:08
I want to know the average absolute frequency per hour for each ID
. In other words, I need to calculate the absolute frequency of ID
per hours, averaged over days and months. The expected result is this one:
ID HOUR FREQ
1 18 1.5
1 19 1
1 20 0
2 18 1
2 19 1
2 20 1
For example, in case of ID
equal to 1, there are 2 entries at 18 for the dates 2017-03-02
, and 1 entry at 18 for the date 2017-04-01
. Therefore FREQ
is equal to 1.5
for HOUR
=18 and ID
=1.
I have this code, but result
does not get created properly:
df["Hour"] = df["Datetimr"].apply(lambda x: x.hour)
result = df.groupby(["Id","Hour"]).agg({'Hour':'size'}).reset_index()
Upvotes: 0
Views: 428
Reputation: 862671
Use:
#if necessary convert to datetime
df['Datetime'] = pd.to_datetime(df['Datetime'])
#groupby by column, by hours and by dates
df = df.groupby([df['ID'], df['Datetime'].dt.hour, df['Datetime'].dt.date]).size()
#groupby by ID and hours
df1 = df.groupby(level=[0,1]).mean()
#add missing values by reindex by Multiindex
mux = pd.MultiIndex.from_product([df1.index.levels[0], df1.index.levels[1]],
names=df1.index.names)
df1 = df1.reindex(mux, fill_value=0).reset_index(name='FREQ')
print (df1)
ID Datetime FREQ
0 1 18 1.5
1 1 19 1.0
2 1 20 0.0
3 2 18 1.0
4 2 19 1.0
5 2 20 1.0
Upvotes: 1
Reputation: 609
df["Hour"] = df["Datetime"].apply(lambda x:x.hour)
df["month"] = df["Datetime"].apply(lambda x:x.month)
dfg = df.groupby(["Id","Hour"])
freq = dfg.size()/dfg["month"].unique().apply(lambda x:len(x))
result = freq.reset_index(name="FREQ")
Upvotes: 0
Reputation: 323236
Try this .
df["Hour"] = pd.to_datetime(df.Datetime).dt.hour
df['Date'] = pd.to_datetime(df.Datetime).dt.date
df1=df.groupby(["Id","Hour","Date"]).size().reset_index()
df2=df1.groupby(["Id","Hour"])[0].mean().unstack('Hour').stack('Hour', dropna=False).reset_index()
df2.fillna(0)
Out[265]:
Id Hour 0
0 1 18 1.5
1 1 19 1.0
2 1 20 0.0
3 2 18 1.0
4 2 19 1.0
5 2 20 1.0
Upvotes: 1