Speed up Pandas DateTime variable

Question

I have a number of quite large cvs files (1,000,000 rows each) which contain a DateTime column. I am using Pandas pivot tables to summarise them. Part of what this involves is splitting out this DateTime variable into hours and minutes. I am using the following code, which is working fine, but it is taking quite a lot of time (around 4-5 minutes).

My question is: Is this just because the files are so large/my laptop to slow, or is there a more efficient code that allows me to split out hours and minutes from a DateTime variable?

Thanks

df['hours'], df['minutes'] = pd.DatetimeIndex(df['DateTime']).hour, pd.DatetimeIndex(df['DateTime']).minute

jezrael · Accepted Answer

If dtypes of column Datetime is not datetime, first convert it to_datetime. Then use dt.hour and dt.minute:

df['DateTime'] = pd.to_datetime(df['DateTime'])
df['hours'], df['minutes'] = df['DateTime'].dt.hour, df['DateTime'].dt.minute

Sample:

import pandas as pd

df = pd.DataFrame({'DateTime': ['2014-06-17 11:09:20', '2014-06-18 10:02:10']})
print (df)
              DateTime
0  2014-06-17 11:09:20
1  2014-06-18 10:02:10

print (df.dtypes)
DateTime    object
dtype: object

df['DateTime'] = pd.to_datetime(df['DateTime'])
df['hours'], df['minutes'] = df['DateTime'].dt.hour, df['DateTime'].dt.minute
print (df)
             DateTime  hours  minutes
0 2014-06-17 11:09:20     11        9
1 2014-06-18 10:02:10     10        2

Speed up Pandas DateTime variable

Answers (1)

Related Questions