Edgar Smith
Edgar Smith

Reputation: 183

Datetime from two different columns in a DataFrame

I am trying to create a new datetime column in my dataframe from existing separate date and time columns. My function works when I test it by just giving it two strings, but when I try it on a dataframe it doesn't. The code is as follows:

import pandas as pd

def datetime_row(date,time):
    if len(time) == 2:
        datetime = date[-2:]+"-"+date[-4:-2]+"-"+date[0:4]+" 00:00:"+time
    else:
        datetime = date[2:]
    return datetime

df = pandas.read_csv('test.csv',dtype=str)

df['datetime'] = datetime_row(df['date'],df['time'])

when I look at the dataframe altered by the function, this is what it looks like:

    date       time     datetime
0   20161213    07           NaN
1   20161213    08           NaN
2   20161213    09      20161213
3   20161213    010     20161213

if I call the datetime_row function on two strings:

datetime_row("20161312","01")

it works fine, outputting:

'12-13-2016 00:00:01'

I'm wondering what I'm doing wrong here. Just to provide full details, the csv I'm reading from looks as follows:

date,time
20161213,07
20161213,08
20161213,09
20161213,010

Upvotes: 0

Views: 65

Answers (2)

Hugolmn
Hugolmn

Reputation: 1560

Using pd.to_datetime

Instead of having to keep the function datetime_row that can slow your program, I would suggest to directly use pandas.to_datetime

df['datetime'] = pd.to_datetime(df.date+df.time.astype(int).astype(str), format='%Y%m%d%S')

The date can be kept as it is (str), but to better deal with the seconds, I cast into int to eliminate the leading zero, and cast back to str.

The result is the following:

    date        time    datetime
0   20161213    07  2016-12-13 00:00:07
1   20161213    08  2016-12-13 00:00:08
2   20161213    09  2016-12-13 00:00:09
3   20161213    010 2016-12-13 00:00:10

Performance vs apply

%%timeit
df['datetime'] = pd.to_datetime(df.date+df.time.astype(int).astype(str), format='%Y%m%d%S')

>> 32.1 ms ± 637 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit 
df['datetime'] = df.apply(lambda row: datetime_row(row["date"], row["time"]), axis=1)

>> 957 ms ± 25.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Using pd.to_datetime instead of a seperated function using apply is about 30 times faster.

Upvotes: 2

valkyrie
valkyrie

Reputation: 106

You can use pandas apply function for this:

df['datetime'] = df.apply(lambda row: datetime_row(row["date"], row["time"]), axis=1)

Upvotes: 2

Related Questions