parker
parker

Reputation: 67

How to find average and maximum in an hour of dataframe pandas?

I have 24 hours of minutes. Here's the example

      log_time              memory1 memory2
0     2020-06-21 00:00:00   1000.0  500.0
1     2020-06-21 00:01:00   1000.0  500.0
2     2020-06-21 00:02:00   1000.0  500.0
3     2020-06-21 00:03:00   1000.0  500.0
4     2020-06-21 00:04:00   1000.0  500.0
...
1440  2020-06-21 23:59:00   1000.0  500.0

And I want to make hourly report with output like this.

      log_time              avg_memory1 avg_memory2 max_memory1 max_memory2
0     2020-06-21 00:00:00   1000.0      500.0       1000        500
1     2020-06-21 01:01:00   1000.0      500.0       1000        500
2     2020-06-21 02:02:00   1000.0      500.0       1000        500
3     2020-06-21 03:03:00   1000.0      500.0       1000        500
4     2020-06-21 04:04:00   1000.0      500.0       1000        500
...
23    2020-06-21 23:00:00   1000.0      500.0       1000        500

Code I tried:

df = df.set_index('log_time').resample('H').mean().reset_index()

The code above only showing the average and I'm getting hardtime finding the maximum values.

Upvotes: 0

Views: 403

Answers (2)

Graham Duncan
Graham Duncan

Reputation: 29

without setting the Timestamp column as the index (and creating a DateTimeIndex), you can use pd.Grouper to instantiate a resampling interval on a datetime column (which is not the index). This is helpful sometimes since setting the index to a DateTimeIndex can be slow as the size of the DataFrame increases



grouper = pd.Grouper(freq='1H', key='log_time')

df.groupby(grouper)[['memory1', 'memory2']].agg(['mean', 'max'])

# note: on the groupby object, you could use the describe() method to get more calculated results, such as the min, std, quartiles, etc. 

# that would look like this:
df.groupby(grouper)[['memory1', 'memory2']].describe()

Upvotes: 1

jezrael
jezrael

Reputation: 862511

Use Rolling.agg by list of functions, then renmae column and sorting, last flatten Multiindex by f-stringss:

df['log_time'] = pd.to_datetime(df['log_time'])

df1 = (df.set_index('log_time')
         .resample('H')
         .agg(['mean','max'])
         .rename(columns={'mean':'avg'})
         .sort_index(axis=1, level=1))
df1.columns = df1.columns.map(lambda x: f'{x[1]}_{x[0]}')
df1 = df1.reset_index()
print (df1)
    log_time  avg_memory1  avg_memory2  max_memory1  max_memory2
0 2020-06-21       1000.0        500.0       1000.0        500.0

Upvotes: 0

Related Questions