Pandas: compute average and standard deviation by clock time

Question

I have a DataFrame like this:

        date             time         value
0     2019-04-18         07:00:10      100.8
1     2019-04-18         07:00:20      95.6
2     2019-04-18         07:00:30      87.6
3     2019-04-18         07:00:40      94.2

The DataFrame contains value recorded every 10 seconds for entire year 2019. I need to calculate standard deviation and mean/average of value for each hour of each date, and create two new columns for them. I have tried first separating the hour for each value like:

df["hour"] = df["time"].astype(str).str[:2]

Then I have tried to calculate standard deviation by:

df["std"] = df.groupby("hour").median().index.get_level_values('value').stack().std()

But that won't work, could I have some advise on the problem?

Shubham Sharma · Accepted Answer

We can split the time column around the delimiter :, then slice the hour component using str[0], finally group the dataframe on date along with hour component and aggregate column value with mean and std:

hr = df['time'].str.split(':', n=1).str[0]
df.groupby(['date', hr])['value'].agg(['mean', 'std'])

If you want to broadcast the aggregated values to original dataframe, then we need to use transform instead of agg:

g = df.groupby(['date', df['time'].str.split(':', n=1).str[0]])['value']
df['mean'], df['std'] = g.transform('mean'), g.transform('std')

         date      time  value   mean       std
0  2019-04-18  07:00:10  100.8  94.55  5.434151
1  2019-04-18  07:00:20   95.6  94.55  5.434151
2  2019-04-18  07:00:30   87.6  94.55  5.434151
3  2019-04-18  07:00:40   94.2  94.55  5.434151

Pandas: compute average and standard deviation by clock time

Answers (2)

Related Questions