Reputation: 231
I have a number of quite large cvs files (1,000,000 rows each) which contain a DateTime column. I am using Pandas pivot tables to summarise them. Part of what this involves is splitting out this DateTime variable into hours and minutes. I am using the following code, which is working fine, but it is taking quite a lot of time (around 4-5 minutes).
My question is: Is this just because the files are so large/my laptop to slow, or is there a more efficient code that allows me to split out hours and minutes from a DateTime variable?
Thanks
df['hours'], df['minutes'] = pd.DatetimeIndex(df['DateTime']).hour, pd.DatetimeIndex(df['DateTime']).minute
Upvotes: 1
Views: 598
Reputation: 862771
If dtypes
of column Datetime
is not datetime
, first convert it to_datetime
. Then use dt.hour
and dt.minute
:
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['hours'], df['minutes'] = df['DateTime'].dt.hour, df['DateTime'].dt.minute
Sample:
import pandas as pd
df = pd.DataFrame({'DateTime': ['2014-06-17 11:09:20', '2014-06-18 10:02:10']})
print (df)
DateTime
0 2014-06-17 11:09:20
1 2014-06-18 10:02:10
print (df.dtypes)
DateTime object
dtype: object
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['hours'], df['minutes'] = df['DateTime'].dt.hour, df['DateTime'].dt.minute
print (df)
DateTime hours minutes
0 2014-06-17 11:09:20 11 9
1 2014-06-18 10:02:10 10 2
Upvotes: 1