Reputation: 1879
I have a large csv file with time stamp data in the iso format 2015-04-01 10:26:41
. The data span multiple months with entries ranging from 30 secs apart to multiple hours. It's columns are id, time, speed.
Ultimately I want to group data by a time interval of 15 mins, then calculate an average speed, for however many entries are in the 15 mins timeslot.
I am trying to use Pandas because it seems like it has a solid time-series tools and it might be easy to do this, but I am falling at the first hurdle.
So far I have imported the CSV as a dataframe and, all columns have a dtype of object
. I have sorted the data by date and am now trying to group the entries by a time interval which is where i'm struggling. Based around google searching, I have tried to resample
the data using this code df.resample('5min', how=sum)
Here I get the error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex
. I was thinking about trying the groupby
method, perhaps using lambda
as in df.groupby(lambda x:x.minutes + 5)
which produces the error AttributeError: 'str' object has no attribute 'minutes'
.
Basically I'm a little confused as to a) whether pandas has the time-series data in a format it's recognising as it's dtype
is object
, and b) if it can recognize it I can't seem to get the time-intervals down.
Keen to learn if anyone could point me in the right direction.
DF looks like this
0 1 2 3
0 id boat_id time speed
1 386226 32 2015-01-15 05:14:32 4.2343243
2 386285 32 2015-01-15 05:44:57 3.45234
Upvotes: 1
Views: 1923
Reputation: 81
Alexander's answer is correct; also note that you can do
df = pd.read_csv('myfile.csv', parse_dates=True)
And your date column should have the datetime type if the format is sane. Then you can set the index and resample as above.
Upvotes: 1
Reputation: 109626
First, it looks like you read a blank row. You probably want to skip the first row in your file pd.read_csv(filename, skiprows=1)
.
You should convert the text representation of the time into a DatetimeIndex using pd.to_datetime()
.
df.set_index(pd.to_datetime(df['time']), inplace=True)
You should then be able to resample.
df.resample('15min', how=np.mean)
Upvotes: 2