Pandas: get average over certain rows and return as a dataframe

Question

I have a df like this

It contains speed and dir at different date's hour minute. For example, the first row records that at 7:11, 20060101, the dir=87, speed=5.

Now, I think the data might be too precise, and I want to use the average at each hour for later computation. How can I do it?

I can do it by groupy

df['Hr']=df['HrMn'].apply(lambda x: str(x)[:-2])
df.groupby(['date', 'Hr'])['speed'].mean()

which would return what I want

But it is not a dataframe, and how can I use for later computation? Specifically, I want to know

If the groupby approach I'm using is the right approach for this problem? If so, how to use it later as a dataframe? (I also need to get dir, dir_max and other attributes as well)
The result groupby return is not well-orderd (in date and Hr), is there anyway to re-order it?

Update:

If I do df.groupby(['date', 'Hr'])['speed'].mean().unstack(), it would return

The data is certainly correct, but I still want to it follow the initial dataframe form as

Except that HrMn -> Hr

Vaishali · Accepted Answer

What you are getting is a multi-index dataframe. you can try

df.groupby(['date', 'Hr'])['speed'].mean().reset_index()

If you want mean for rest of the data, try

df.groupby(['date', 'Hr'])['speed', 'dir_max', 'speed_max'].mean().reset_index()

EDIT: Applying mean on speed column and max on dir_max and speed_max

df.groupby(['date', 'Hr']).agg({'speed' : np.mean,'dir_max' : np.max, 'speed_max': np.max}).reset_index()

Answers (1)