Little Bobby Tables
Little Bobby Tables

Reputation: 5351

How to represent only the hour part of a date in Python/Pandas

I have a CSV file representing a daily schedule, e.g.

1, a, b, 10:00, 12:00

meaning that trip 1 from a to b departs daily at 10:00 and arrives at 12:00.

I need to load this data into a Pandas data frame and work on it conveniently, e.g. find trips between given hours or sort by departure time.

What is the recommended way to represent only the hour and minutes portion of a date in Pandas?

Upvotes: 1

Views: 984

Answers (2)

elyase
elyase

Reputation: 40973

If have a file like this one:

test.csv

trip,from,to,start,end
1,a,b,10:00,12:00
2,d,c,09:00,10:00

You can import it with:

>>> df = pd.read_csv('test.csv', index_col=0, parse_dates=[3, 4])
>>> df
      from  to   start     end
trip                          
1        a   b   10:00   12:00
2        d   c   09:00   10:00

Then you can query it like:

>>> df[df['start'] > '9:30']
     from to               start                 end
trip                                                
1       a  b 2015-01-26 10:00:00 2015-01-26 12:00:00

Alternatively just work with the hours as integers:

>>> df['hour'] = df['start'].apply(lambda x: x.hour)
>>> df
     from to               start                 end  hour
trip                                                      
1       a  b 2015-01-26 10:00:00 2015-01-26 12:00:00    10
2       d  c 2015-01-26 09:00:00 2015-01-26 10:00:00     9

Upvotes: 0

knightofni
knightofni

Reputation: 1956

I would just store it as a time object. You'll be able to sort / slice based on other time object (more convenient than storing as one column for H, one for Min).

ex :

import datetime as dt
t0 = dt.time(hour=12, minute=10)
print t0.hour

12

It should be quite straightforward to get time objects into a dataframe (load them as string, then use an apply to transform into a time object)

let me know if you need help for that.

Upvotes: 2

Related Questions