group pandas time-series data frame using specific time intervals

Question

I have a large csv file with time stamp data in the iso format 2015-04-01 10:26:41. The data span multiple months with entries ranging from 30 secs apart to multiple hours. It's columns are id, time, speed.

Ultimately I want to group data by a time interval of 15 mins, then calculate an average speed, for however many entries are in the 15 mins timeslot.

I am trying to use Pandas because it seems like it has a solid time-series tools and it might be easy to do this, but I am falling at the first hurdle.

So far I have imported the CSV as a dataframe and, all columns have a dtype of object. I have sorted the data by date and am now trying to group the entries by a time interval which is where i'm struggling. Based around google searching, I have tried to resample the data using this code df.resample('5min', how=sum) Here I get the error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex. I was thinking about trying the groupbymethod, perhaps using lambda as in df.groupby(lambda x:x.minutes + 5) which produces the error AttributeError: 'str' object has no attribute 'minutes'.

Basically I'm a little confused as to a) whether pandas has the time-series data in a format it's recognising as it's dtype is object, and b) if it can recognize it I can't seem to get the time-intervals down.

Keen to learn if anyone could point me in the right direction.

DF looks like this

        0        1                    2      3       
0          id  boat_id                 time  speed     
1      386226       32  2015-01-15 05:14:32      4.2343243      
2      386285       32  2015-01-15 05:44:57      3.45234

Alexander · Accepted Answer

First, it looks like you read a blank row. You probably want to skip the first row in your file pd.read_csv(filename, skiprows=1).

You should convert the text representation of the time into a DatetimeIndex using pd.to_datetime().

df.set_index(pd.to_datetime(df['time']), inplace=True)

You should then be able to resample.

df.resample('15min', how=np.mean)

group pandas time-series data frame using specific time intervals

Answers (2)

Related Questions