Reputation: 5351
I have a CSV file representing a daily schedule, e.g.
1, a, b, 10:00, 12:00
meaning that trip 1 from a
to b
departs daily at 10:00 and arrives at 12:00.
I need to load this data into a Pandas data frame and work on it conveniently, e.g. find trips between given hours or sort by departure time.
What is the recommended way to represent only the hour and minutes portion of a date in Pandas?
Upvotes: 1
Views: 984
Reputation: 40973
If have a file like this one:
test.csv
trip,from,to,start,end
1,a,b,10:00,12:00
2,d,c,09:00,10:00
You can import it with:
>>> df = pd.read_csv('test.csv', index_col=0, parse_dates=[3, 4])
>>> df
from to start end
trip
1 a b 10:00 12:00
2 d c 09:00 10:00
Then you can query it like:
>>> df[df['start'] > '9:30']
from to start end
trip
1 a b 2015-01-26 10:00:00 2015-01-26 12:00:00
Alternatively just work with the hours as integers:
>>> df['hour'] = df['start'].apply(lambda x: x.hour)
>>> df
from to start end hour
trip
1 a b 2015-01-26 10:00:00 2015-01-26 12:00:00 10
2 d c 2015-01-26 09:00:00 2015-01-26 10:00:00 9
Upvotes: 0
Reputation: 1956
I would just store it as a time object. You'll be able to sort / slice based on other time object (more convenient than storing as one column for H, one for Min).
ex :
import datetime as dt
t0 = dt.time(hour=12, minute=10)
print t0.hour
12
It should be quite straightforward to get time objects into a dataframe (load them as string, then use an apply to transform into a time object)
let me know if you need help for that.
Upvotes: 2