Reputation:
I have some csv data of accelerometer readings in the following format (not exactly this, the real data has a higher sampling rate):
2013-09-28 17:36:50.322120, 0.152695, -0.545074, -0.852997 2013-09-28 17:36:50.622988, 0.141800, -0.554947, -0.867935 2013-09-28 17:36:51.923802, 0.132431, -0.547089, -0.879333 2013-09-28 17:36:52.124641, 0.124329, -0.530243, -0.887741 2013-09-28 17:36:52.425341, 0.122269, -0.519669, -0.900269 2013-09-28 17:36:52.926202, 0.122879, -0.502151, -0.902023 .... .... .... .... 2013-09-28 17:49:14.440343, 0.005447, -0.623016, -0.773529 2013-09-28 17:49:14.557806, 0.009048, -0.623093, -0.790909 2013-09-28 17:49:14.758442, 0.007217, -0.617386, -0.815796
I loaded them using pandas
import pandas as pd
accDF=pd.read_csv(accFileName,header=0, sep=',')
accDF.columns=['time','x','y','z']
accDF=accDF.set_index(['time'])
The accelerometer data is not uniformly sampled, and I want to group data by every 10 or 20 or 30 seconds and apply a custom function to the data group.
If the data was uniformly sampled, it would have been easy to apply a rolling function. However, since it is not, I want to apply groupby using timestamp interval. Doing so with an interval of one second is easy:
accDF_win=accDF.groupby(accDF.index.second).apply... etc
However, I cannot figure out how to group by an arbitary number of seconds and then apply a function to it.
With TimeGrouper, I can do the following:
accDF_win=accDF.groupby(pd.TimeGrouper(freq='3Min'))
for an arbitrary number of minutes, but seems like TimeGrouper doesn't have 'second' resolution.
Thanks in advance for your help
Upvotes: 8
Views: 10412
Reputation: 35255
I don't think you need a TimeGrouper. Resampling by the second is supported. You're not the first person to try 'S' for seconds (so maybe pandas should support it?); the correct string is 's'.
df = pd.read_csv(filename, parse_dates=True, sep=',', index_col=0, header=None)
df.columns = ['x', 'y', 'z']
df.resample('10s', how=f) # where f is your function
Edit: Actually here, on my version (the soon-to-be-released 0.13) I find that '10S' works as well. Maybe your whole problem was not parsing the dates.
Upvotes: 2
Reputation: 9709
First of all, you have to convert the datetime-column to a python-datetime object (in case you did'nt).
>>> import pandas as pd
>>> from dateutil import parser
>>> df=pd.read_csv("test.csv",header=None,date_parser=True)
#convert to datetime index, f.e. with dateutil
>>> df=df.set_index(df[0].map(parser.parse)
Then use the pd.TimeGrouper
like this:
>>> df[3].groupby(pd.TimeGrouper('10S')).head()
2013-09-28 17:36:40 2013-09-28 17:36:40.322120 -0.852997
2013-09-28 17:36:41.622988 -0.867935
2013-09-28 17:36:42.923802 -0.879333
2013-09-28 17:36:43.124641 -0.887741
2013-09-28 17:36:45.425341 -0.900269
2013-09-28 17:36:50 2013-09-28 17:36:52.926202 -0.902023
2013-09-28 17:36:53.322120 -0.852997
2013-09-28 17:36:53.622988 -0.867935
2013-09-28 17:36:54.923802 -0.879333
2013-09-28 17:36:54.124641 -0.887741
2013-09-28 17:49:50 2013-09-28 17:49:56.440343 -0.773529
2013-09-28 17:49:56.557806 -0.790909
2013-09-28 17:49:57.758442 -0.815796
Or have a look at the resampling-functions here. Maybe you could apply a custom resampling-function instead of using the groupby-method.
df[3].resample("10S",how=lambda x: Whateveryouwanttodo)
Without any function, it fills up with NaN:
>>> df[3].resample("10S")
0
2013-09-28 17:36:40 -0.877655
2013-09-28 17:36:50 -0.884617
2013-09-28 17:37:00 NaN
2013-09-28 17:37:10 NaN
2013-09-28 17:37:20 NaN
2013-09-28 17:37:30 NaN
2013-09-28 17:37:40 NaN
Upvotes: 4