Reputation: 4122
I want to convert 5 minute intervals - which are integers - into time format.
For example, below, the 0 interval should become 00:00, the 5 interval should become 00:05, etc.
date interval
2012-10-01 0
2012-10-01 5
2012-10-01 10
2012-10-01 15
2012-10-01 20
2012-10-01 25
2012-10-01 30
2012-10-01 35
2012-10-01 40
I thought the following would work:
df['interval'] = pd.to_datetime(df['interval'], format='%H:%M').dt.hour
But it returns this error:
time data 0 does not match format '%H:%M' (match)
Which is logical, but leaves me unclear as to how to format the to.datetime
format argument. I can't see anything in the pandas documentation which helps.
Update
Unfortunately I can't get any of these to work with my actual DataFrame. I should add the further info that the interval
variable runs from 0 to 2355 over and over. The variable has 17,568 rows of the 0 to 2355 values.
@padraig, with both your answers I get this error:
ValueError: hour must be in 0..23
Upvotes: 1
Views: 3202
Reputation: 109528
Although you asked for the time, you are probably better off with a Pandas timestamp which records both date and time.
Given your interval you, you can convert that into hours and minutes:
df['hour'] = df.interval // 100
df['mins'] = df.interval.apply(lambda interval: interval % 100)
You can now create a timestamp, optionally with a timezone (e.g. UTC):
from pytz import UTC
df['timestamp'] = df.apply(lambda row: pd.Timestamp('{0} {1}:{2}'.format(row.date, row.hour, row.mins), tz=UTC), axis=1)
>>> df
date interval hour mins timestamp
0 2012-10-01 0 0 0 2012-10-01 00:00:00
1 2012-10-01 5 0 5 2012-10-01 00:05:00
2 2012-10-01 10 0 10 2012-10-01 00:10:00
3 2012-10-01 15 0 15 2012-10-01 00:15:00
4 2012-10-01 20 0 20 2012-10-01 00:20:00
5 2012-10-01 25 0 25 2012-10-01 00:25:00
6 2012-10-01 30 0 30 2012-10-01 00:30:00
7 2012-10-01 35 0 35 2012-10-01 00:35:00
8 2012-10-01 40 0 40 2012-10-01 00:40:00
Given the timestamp, you can access the other properties such as time as follows:
>>> df.timestamp[5].time()
datetime.time(0, 25)
If you truly want time as a separate column (formatted as text, but optionally any other desired time format):
df['time'] = df.timestamp.apply(lambda time: time.strftime('%H:%M'))
>>> df
date interval hour mins timestamp time
0 2012-10-01 0 0 0 2012-10-01 00:00:00+00:00 00:00
1 2012-10-01 5 0 5 2012-10-01 00:05:00+00:00 00:05
2 2012-10-01 10 0 10 2012-10-01 00:10:00+00:00 00:10
3 2012-10-01 15 0 15 2012-10-01 00:15:00+00:00 00:15
4 2012-10-01 20 0 20 2012-10-01 00:20:00+00:00 00:20
5 2012-10-01 25 0 25 2012-10-01 00:25:00+00:00 00:25
6 2012-10-01 30 0 30 2012-10-01 00:30:00+00:00 00:30
7 2012-10-01 35 0 35 2012-10-01 00:35:00+00:00 00:35
8 2012-10-01 40 0 40 2012-10-01 00:40:00+00:00 00:40
Upvotes: 1
Reputation: 180391
from datetime import time
import pandas as pd
def to_time(x):
hours, mn = divmod(x,60)
return time(hours,mn)
df["interval"] = df["interval"].apply(to_time)
Output:
date interval
0 2012-10-01 00:00:00
1 2012-10-01 00:05:00
2 2012-10-01 00:10:00
3 2012-10-01 00:15:00
4 2012-10-01 00:20:00
5 2012-10-01 00:25:00
6 2012-10-01 00:30:00
7 2012-10-01 00:35:00
8 2012-10-01 00:40:00
Or with read_csv and just incrementing the hours:
from datetime import time
import pandas as pd
def to_time(x):
hours, mn = divmod(int(x), 60)
return "{:02}:{:02}".format(hours, mn)
df = pd.read_csv("test.csv", date_parser=to_time, parse_dates=["interval"])
print(df)
Which if we change the last "interval" to 2355
outputs:
date interval
0 2012-10-01 00:00
1 2012-10-01 00:05
2 2012-10-01 00:10
3 2012-10-01 00:15
4 2012-10-01 00:20
5 2012-10-01 00:25
6 2012-10-01 00:30
7 2012-10-01 00:35
8 2012-10-01 39:15
Upvotes: 2
Reputation: 6738
Just format it as a string.
Method 1 (using the old style):
"%02d:%02d" % (int(interval / 60), interval % 60)
Method 2 (using the newer style):
"{:02d}:{:02d}".format(int(interval / 60), interval % 60)
Upvotes: 2