RDJ
RDJ

Reputation: 4122

Pandas: Convert time interval integers to times

I want to convert 5 minute intervals - which are integers - into time format.

For example, below, the 0 interval should become 00:00, the 5 interval should become 00:05, etc.

date        interval
2012-10-01  0
2012-10-01  5
2012-10-01  10
2012-10-01  15
2012-10-01  20
2012-10-01  25
2012-10-01  30
2012-10-01  35
2012-10-01  40

I thought the following would work:

df['interval'] = pd.to_datetime(df['interval'], format='%H:%M').dt.hour

But it returns this error:

time data 0 does not match format '%H:%M' (match)

Which is logical, but leaves me unclear as to how to format the to.datetime format argument. I can't see anything in the pandas documentation which helps.


Update

Unfortunately I can't get any of these to work with my actual DataFrame. I should add the further info that the interval variable runs from 0 to 2355 over and over. The variable has 17,568 rows of the 0 to 2355 values.

@padraig, with both your answers I get this error:

ValueError: hour must be in 0..23

Upvotes: 1

Views: 3202

Answers (3)

Alexander
Alexander

Reputation: 109528

Although you asked for the time, you are probably better off with a Pandas timestamp which records both date and time.

Given your interval you, you can convert that into hours and minutes:

df['hour'] = df.interval // 100 
df['mins'] = df.interval.apply(lambda interval: interval % 100)

You can now create a timestamp, optionally with a timezone (e.g. UTC):

from pytz import UTC

df['timestamp'] = df.apply(lambda row: pd.Timestamp('{0} {1}:{2}'.format(row.date, row.hour, row.mins), tz=UTC), axis=1)

>>> df
         date  interval  hour  mins           timestamp
0  2012-10-01         0     0     0 2012-10-01 00:00:00
1  2012-10-01         5     0     5 2012-10-01 00:05:00
2  2012-10-01        10     0    10 2012-10-01 00:10:00
3  2012-10-01        15     0    15 2012-10-01 00:15:00
4  2012-10-01        20     0    20 2012-10-01 00:20:00
5  2012-10-01        25     0    25 2012-10-01 00:25:00
6  2012-10-01        30     0    30 2012-10-01 00:30:00
7  2012-10-01        35     0    35 2012-10-01 00:35:00
8  2012-10-01        40     0    40 2012-10-01 00:40:00

Given the timestamp, you can access the other properties such as time as follows:

>>> df.timestamp[5].time()
datetime.time(0, 25)

If you truly want time as a separate column (formatted as text, but optionally any other desired time format):

df['time'] = df.timestamp.apply(lambda time: time.strftime('%H:%M'))

>>> df
         date  interval  hour  mins                  timestamp   time
0  2012-10-01         0     0     0  2012-10-01 00:00:00+00:00  00:00
1  2012-10-01         5     0     5  2012-10-01 00:05:00+00:00  00:05
2  2012-10-01        10     0    10  2012-10-01 00:10:00+00:00  00:10
3  2012-10-01        15     0    15  2012-10-01 00:15:00+00:00  00:15
4  2012-10-01        20     0    20  2012-10-01 00:20:00+00:00  00:20
5  2012-10-01        25     0    25  2012-10-01 00:25:00+00:00  00:25
6  2012-10-01        30     0    30  2012-10-01 00:30:00+00:00  00:30
7  2012-10-01        35     0    35  2012-10-01 00:35:00+00:00  00:35
8  2012-10-01        40     0    40  2012-10-01 00:40:00+00:00  00:40

Upvotes: 1

Padraic Cunningham
Padraic Cunningham

Reputation: 180391

from datetime import time
import pandas as pd
def to_time(x):
    hours, mn = divmod(x,60)
    return  time(hours,mn)

df["interval"] = df["interval"].apply(to_time)

Output:

         date  interval
0  2012-10-01  00:00:00
1  2012-10-01  00:05:00
2  2012-10-01  00:10:00
3  2012-10-01  00:15:00
4  2012-10-01  00:20:00
5  2012-10-01  00:25:00
6  2012-10-01  00:30:00
7  2012-10-01  00:35:00
8  2012-10-01  00:40:00

Or with read_csv and just incrementing the hours:

from datetime import time
import pandas as pd
def to_time(x):
    hours, mn = divmod(int(x), 60)
    return "{:02}:{:02}".format(hours, mn)
df = pd.read_csv("test.csv", date_parser=to_time, parse_dates=["interval"])
print(df)

Which if we change the last "interval" to 2355 outputs:

         date interval
0  2012-10-01    00:00
1  2012-10-01    00:05
2  2012-10-01    00:10
3  2012-10-01    00:15
4  2012-10-01    00:20
5  2012-10-01    00:25
6  2012-10-01    00:30
7  2012-10-01    00:35
8  2012-10-01    39:15

Upvotes: 2

theB
theB

Reputation: 6738

Just format it as a string.

Method 1 (using the old style):

"%02d:%02d" % (int(interval / 60), interval % 60)

Method 2 (using the newer style):

"{:02d}:{:02d}".format(int(interval / 60), interval % 60)

Upvotes: 2

Related Questions