ZK Zhao
ZK Zhao

Reputation: 21613

Pandas: get average over certain rows and return as a dataframe

I have a df like this

enter image description here

It contains speed and dir at different date's hour minute. For example, the first row records that at 7:11, 20060101, the dir=87, speed=5.

Now, I think the data might be too precise, and I want to use the average at each hour for later computation. How can I do it?


I can do it by groupy

df['Hr']=df['HrMn'].apply(lambda x: str(x)[:-2])
df.groupby(['date', 'Hr'])['speed'].mean()

which would return what I want

enter image description here

But it is not a dataframe, and how can I use for later computation? Specifically, I want to know

  1. If the groupby approach I'm using is the right approach for this problem? If so, how to use it later as a dataframe? (I also need to get dir, dir_max and other attributes as well)

  2. The result groupby return is not well-orderd (in date and Hr), is there anyway to re-order it?


Update:

If I do df.groupby(['date', 'Hr'])['speed'].mean().unstack(), it would return

enter image description here

The data is certainly correct, but I still want to it follow the initial dataframe form as

enter image description here

Except that HrMn -> Hr

Upvotes: 2

Views: 2262

Answers (1)

Vaishali
Vaishali

Reputation: 38425

What you are getting is a multi-index dataframe. you can try

df.groupby(['date', 'Hr'])['speed'].mean().reset_index()

If you want mean for rest of the data, try

df.groupby(['date', 'Hr'])['speed', 'dir_max', 'speed_max'].mean().reset_index()

EDIT: Applying mean on speed column and max on dir_max and speed_max

df.groupby(['date', 'Hr']).agg({'speed' : np.mean,'dir_max' : np.max, 'speed_max': np.max}).reset_index()

Upvotes: 1

Related Questions