Dinosaurius
Dinosaurius

Reputation: 8628

How to get the average absolute frequency per hour for each group?

I have the following dataframe:

df =
    Id         Datetime
    1          2017-03-02 18:06:20
    1          2017-03-02 18:05:10
    1          2017-04-01 18:04:09
    1          2017-03-02 19:06:50
    1          2017-04-01 19:07:22
    2          2017-03-03 18:09:15
    2          2017-05-03 19:07:05
    2          2017-05-03 20:19:08

I want to know the average absolute frequency per hour for each ID. In other words, I need to calculate the absolute frequency of ID per hours, averaged over days and months. The expected result is this one:

ID   HOUR   FREQ
1    18     1.5
1    19     1
1    20     0
2    18     1
2    19     1
2    20     1

For example, in case of ID equal to 1, there are 2 entries at 18 for the dates 2017-03-02, and 1 entry at 18 for the date 2017-04-01. Therefore FREQ is equal to 1.5 for HOUR=18 and ID=1.

I have this code, but result does not get created properly:

df["Hour"] = df["Datetimr"].apply(lambda x: x.hour)
result = df.groupby(["Id","Hour"]).agg({'Hour':'size'}).reset_index()

Upvotes: 0

Views: 428

Answers (3)

jezrael
jezrael

Reputation: 862671

Use:

#if necessary convert to datetime
df['Datetime'] = pd.to_datetime(df['Datetime'])

#groupby by column, by hours and by dates
df = df.groupby([df['ID'],  df['Datetime'].dt.hour,  df['Datetime'].dt.date]).size()
#groupby by ID and hours
df1 = df.groupby(level=[0,1]).mean()
#add missing values by reindex by Multiindex
mux = pd.MultiIndex.from_product([df1.index.levels[0], df1.index.levels[1]], 
                                 names=df1.index.names)
df1 = df1.reindex(mux, fill_value=0).reset_index(name='FREQ')
print (df1)
   ID  Datetime  FREQ
0   1        18   1.5
1   1        19   1.0
2   1        20   0.0
3   2        18   1.0
4   2        19   1.0
5   2        20   1.0

Upvotes: 1

Huang
Huang

Reputation: 609

df["Hour"] = df["Datetime"].apply(lambda x:x.hour)
df["month"] = df["Datetime"].apply(lambda x:x.month)
dfg = df.groupby(["Id","Hour"])
freq = dfg.size()/dfg["month"].unique().apply(lambda x:len(x))
result = freq.reset_index(name="FREQ")

Upvotes: 0

BENY
BENY

Reputation: 323236

Try this .

df["Hour"] = pd.to_datetime(df.Datetime).dt.hour
df['Date'] = pd.to_datetime(df.Datetime).dt.date
df1=df.groupby(["Id","Hour","Date"]).size().reset_index()
df2=df1.groupby(["Id","Hour"])[0].mean().unstack('Hour').stack('Hour', dropna=False).reset_index()
df2.fillna(0)


Out[265]: 
   Id  Hour    0
0   1    18  1.5
1   1    19  1.0
2   1    20  0.0
3   2    18  1.0
4   2    19  1.0
5   2    20  1.0

Upvotes: 1

Related Questions