Reputation: 85
I need help with this. I want to get the average of my data over a time interval using a specific time step for example 3 second intervals. So therefore, data is in one second resolution and I want to change the resolution to 3 seconds using the average values.
The Input:
time | S |
---|---|
2:30:01 | 5 |
2:30:02 | 9 |
2:30:03 | 450 |
2:30:04 | 7 |
2:30:05 | 10 |
2:30:06 | 300 |
This is what I was thinking... Alas, this is not changing the time intervals and the average isn't at the right index.
import pandas
step = 3 #3 seconds
df = df.groupby(df.index // step).mean()
Note: Please if you think there's another way to approach the problem I'm all ears.
Output:
time | S |
---|---|
2:30:03 | 154.7 |
2:30:06 | 105.7 |
Upvotes: 1
Views: 810
Reputation: 122012
import pandas as pd
from datetime import datetime, timedelta
x = [ ['2:30:01', '5'],
['2:30:02', '9'],
['2:30:03', '450'],
['2:30:04', '7'],
['2:30:05', '10'],
['2:30:06', '300']]
df = pd.DataFrame(x, columns=(['time', 's']))
df['time'] = df['time'].apply(lambda t: datetime.strptime(t, '%H:%M:%S'))
df['s'] = df['s'].astype(int)
df_new = pd.DataFrame([{'start_time':interval_start.strftime("%H:%M:%S"),
'end_time': (interval_start+timedelta(0,2)).strftime("%H:%M:%S"),
's': sum(rows['s'])/len(rows['s'])}
for interval_start, rows in
df.set_index('time').resample('3s',offset="1s")])
[out]:
start_time end_time s
0 02:30:01 02:30:03 154.666667
1 02:30:04 02:30:06 105.666667
First it's easier to manipulate time if you convert the string time type to datetime objects (or if you are Dr. Who =)):
df['time'] = df['time'].apply(lambda t: datetime.strptime(t, '%H:%M:%S'))
The heavy lifting is done by:
df.set_index('time').resample('3s',offset="1s")]
The DataFrame.resample(...)
acts like a group by where you group by 3 seconds using '3s'
and you have an offset of 1s
, thus grouping all your data points within the 3 secs interval.
And this converts the datetime object to the original string format of your timestamp:
interval_start.strftime("%H:%M:%S")
And to get the end time of the interval:
interval_start+timedelta(0,2))
Upvotes: 2