Reputation: 451
I have 1.6 million rows of data and I need to calculate difference between two dates. I have read the data from an excel file and then looked for the dtypes of columns. The date and time columns both have object datatypes. I need to convert these columns into datetime. I have tried it but it is only working for time but not for date. I have tried the following:
test = pd.read_excel(datadir+'test.xlsx', sheetname = 'Sheet1' )
test['time'] = pd.to_datetime(test['time'], format='%H:%M:%S')
test['date'] = pd.to_datetime(test['date'], format='%d%b%y')
It does work for time but not for date and gives the following error: ValueError: unconverted data remains: 14
I have also tried without specifying format for date but then I get ValueError: Unknown string format
print(len(test['date']))
returns 7
print(test['date'].iloc[0])
returns 17JAN2014.
Here is the sample data: date time 17JAN2014 12:48 18JAN2014 13:15 17JAN2014 09:20 18JAN2014 07:45 04FEB2012 10:00 05FEB2012 17:22 12MAR2012 19:50
I have no clue what is going wrong. Any help will be appreciated. One more thing, Can anyone tells me how do I combine date and time columns into one column as a datetime?
Upvotes: 1
Views: 1304
Reputation: 863301
There is problem with year, need Y
for match YYYY
, y
match YY
.
Check also http://strftime.org/.
df['date'] = pd.to_datetime(df['date'], format='%d%b%Y')
print (df)
date time
0 2014-01-17 12:48
1 2014-01-18 13:15
2 2014-01-17 09:20
3 2014-01-18 07:45
4 2012-02-04 10:00
5 2012-02-05 17:22
6 2012-03-12 19:50
Upvotes: 1