codemastro
codemastro

Reputation: 85

How to average of data over specific time period recoding the ending time

I need help with this. I want to get the average of my data over a time interval using a specific time step for example 3 second intervals. So therefore, data is in one second resolution and I want to change the resolution to 3 seconds using the average values.

The Input:

time S
2:30:01 5
2:30:02 9
2:30:03 450
2:30:04 7
2:30:05 10
2:30:06 300

This is what I was thinking... Alas, this is not changing the time intervals and the average isn't at the right index.

import pandas

step = 3  #3 seconds
df = df.groupby(df.index // step).mean()

Note: Please if you think there's another way to approach the problem I'm all ears.

Output:

time S
2:30:03 154.7
2:30:06 105.7

Upvotes: 1

Views: 810

Answers (1)

alvas
alvas

Reputation: 122012

TL;DR

import pandas as pd

from datetime import datetime, timedelta


x = [ ['2:30:01', '5'],
 ['2:30:02', '9'],
 ['2:30:03', '450'],
 ['2:30:04', '7'],
 ['2:30:05', '10'],
 ['2:30:06', '300']]


df = pd.DataFrame(x, columns=(['time', 's']))
df['time'] = df['time'].apply(lambda t: datetime.strptime(t, '%H:%M:%S'))
df['s'] = df['s'].astype(int)

df_new = pd.DataFrame([{'start_time':interval_start.strftime("%H:%M:%S"), 
                        'end_time': (interval_start+timedelta(0,2)).strftime("%H:%M:%S"),
                        's': sum(rows['s'])/len(rows['s'])} 
                        for interval_start, rows in 
                        df.set_index('time').resample('3s',offset="1s")])

[out]:

    start_time  end_time    s
0   02:30:01    02:30:03    154.666667
1   02:30:04    02:30:06    105.666667

In Long

First it's easier to manipulate time if you convert the string time type to datetime objects (or if you are Dr. Who =)):

df['time'] = df['time'].apply(lambda t: datetime.strptime(t, '%H:%M:%S'))

The heavy lifting is done by:

df.set_index('time').resample('3s',offset="1s")]

The DataFrame.resample(...) acts like a group by where you group by 3 seconds using '3s' and you have an offset of 1s, thus grouping all your data points within the 3 secs interval.

And this converts the datetime object to the original string format of your timestamp:

interval_start.strftime("%H:%M:%S")

And to get the end time of the interval:

interval_start+timedelta(0,2))

Upvotes: 2

Related Questions