Reputation: 183
I am trying to create a new datetime column in my dataframe from existing separate date and time columns. My function works when I test it by just giving it two strings, but when I try it on a dataframe it doesn't. The code is as follows:
import pandas as pd
def datetime_row(date,time):
if len(time) == 2:
datetime = date[-2:]+"-"+date[-4:-2]+"-"+date[0:4]+" 00:00:"+time
else:
datetime = date[2:]
return datetime
df = pandas.read_csv('test.csv',dtype=str)
df['datetime'] = datetime_row(df['date'],df['time'])
when I look at the dataframe altered by the function, this is what it looks like:
date time datetime
0 20161213 07 NaN
1 20161213 08 NaN
2 20161213 09 20161213
3 20161213 010 20161213
if I call the datetime_row function on two strings:
datetime_row("20161312","01")
it works fine, outputting:
'12-13-2016 00:00:01'
I'm wondering what I'm doing wrong here. Just to provide full details, the csv I'm reading from looks as follows:
date,time
20161213,07
20161213,08
20161213,09
20161213,010
Upvotes: 0
Views: 65
Reputation: 1560
pd.to_datetime
Instead of having to keep the function datetime_row
that can slow your program, I would suggest to directly use pandas.to_datetime
df['datetime'] = pd.to_datetime(df.date+df.time.astype(int).astype(str), format='%Y%m%d%S')
The date can be kept as it is (str), but to better deal with the seconds, I cast into int to eliminate the leading zero, and cast back to str.
The result is the following:
date time datetime
0 20161213 07 2016-12-13 00:00:07
1 20161213 08 2016-12-13 00:00:08
2 20161213 09 2016-12-13 00:00:09
3 20161213 010 2016-12-13 00:00:10
apply
%%timeit
df['datetime'] = pd.to_datetime(df.date+df.time.astype(int).astype(str), format='%Y%m%d%S')
>> 32.1 ms ± 637 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
df['datetime'] = df.apply(lambda row: datetime_row(row["date"], row["time"]), axis=1)
>> 957 ms ± 25.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Using pd.to_datetime
instead of a seperated function using apply
is about 30 times faster.
Upvotes: 2
Reputation: 106
You can use pandas apply function for this:
df['datetime'] = df.apply(lambda row: datetime_row(row["date"], row["time"]), axis=1)
Upvotes: 2