jma
jma

Reputation: 3779

Pandas read_csv reading time offset strings

I have a text file with data with columns like '10:15.3' meaning 10 minutes 15.3 seconds after some canonical event. When I read this with read_csv, I'm getting strings:

>>> df.time.head()
0    08:32.0
1    08:38.0
2    08:39.0
3    08:43.0
4    09:15.0
Name: time, dtype: object
 >>> df.time.head()[:1][0]
'08:32.0'
>>> 

I feel like I should be able to get seconds easily enough within pandas, either by specifying a conversion in read_csv or (probably better, so I have both) by appending a new column, but I'm not seeing how to do it. I'm pretty sure this is just me being dense.

Can anyone offer a tip to help me get unstuck?

Upvotes: 0

Views: 737

Answers (4)

Jeff
Jeff

Reputation: 129008

Using 0.15, this is quite easy to do, full docs are here

Your format is not a 'regular' format, but can easily make it one, by pre-pending '00:' to make the hours field.

In [20]: pd.Timedelta('00:' + '10:15.3')
Out[20]: Timedelta('0 days 00:10:15.300000')

In [19]: s
Out[19]: 
0    08:32.0
1    08:38.0
2    08:39.0
3    08:43.0
4    09:15.0
Name: 1, dtype: object

Parse it to a timedelta.

In [15]: res = pd.to_timedelta('00:' + s)

In [16]: res
Out[16]: 
0   00:08:32
1   00:08:38
2   00:08:39
3   00:08:43
4   00:09:15
Name: 1, dtype: timedelta64[ns]

These are the displayed seconds from a canonical reduced format (e.g. hh:mm:ss in lowest form).

In [17]: res.dt.seconds
Out[17]: 
0    32
1    38
2    39
3    43
4    15
dtype: int64

If you need the total seconds represented, do this (this is called frequency conversion):

In [18]: res.astype('timedelta64[s]')
Out[18]: 
0    512
1    518
2    519
3    523
4    555
Name: 1, dtype: float64

Upvotes: 1

Woody Pride
Woody Pride

Reputation: 13965

You can use the datetime.time object. To get this you provide: hours, minutes, seconds, microseconds. These are provided as integers, so you just need to int cast the relevant part of each string to the datetime.date constructor.

So in your case:

import datetime 
df = pd.read_csv('your_csv.csv')
df.time = pd.Series([datetime.time(0, int(val[:2]), int(val[3:5]), int(val[6:])*100000)
                     for val in df.time], index = df.index)

Upvotes: 1

Rusty
Rusty

Reputation: 914

Haven't worked with pandas, but I would extract seconds like this:

from time import strptime
print strptime('09:12.14','%I:%M.%S').tm_sec

Upvotes: 0

Gullydwarf
Gullydwarf

Reputation: 355

I can't really help you with pandas internals, but there is an easy pythonic way to get the seconds in numerical format:

timeStr = '08:32.0'
seconds = float(time.split(':')[1]) # Split splits the string at the ':' character making a list of strings. Taking the second item from the list and casting to float should get you what you want.

Upvotes: 0

Related Questions