Reputation: 21613
I have a df like this
It contains speed
and dir
at different date's hour minute. For example, the first row records that at 7:11, 20060101, the dir=87, speed=5
.
Now, I think the data might be too precise, and I want to use the average at each hour for later computation. How can I do it?
I can do it by groupy
df['Hr']=df['HrMn'].apply(lambda x: str(x)[:-2])
df.groupby(['date', 'Hr'])['speed'].mean()
which would return what I want
But it is not a dataframe, and how can I use for later computation? Specifically, I want to know
If the groupby
approach I'm using is the right approach for this problem? If so, how to use it later as a dataframe? (I also need to get dir
, dir_max
and other attributes as well)
The result groupby
return is not well-orderd (in date
and Hr
), is there anyway to re-order it?
Update:
If I do df.groupby(['date', 'Hr'])['speed'].mean().unstack()
, it would return
The data is certainly correct, but I still want to it follow the initial dataframe form as
Except that HrMn
-> Hr
Upvotes: 2
Views: 2262
Reputation: 38425
What you are getting is a multi-index dataframe. you can try
df.groupby(['date', 'Hr'])['speed'].mean().reset_index()
If you want mean for rest of the data, try
df.groupby(['date', 'Hr'])['speed', 'dir_max', 'speed_max'].mean().reset_index()
EDIT: Applying mean on speed column and max on dir_max and speed_max
df.groupby(['date', 'Hr']).agg({'speed' : np.mean,'dir_max' : np.max, 'speed_max': np.max}).reset_index()
Upvotes: 1