Reputation: 1096
I have an excel file which has 3 columns as either date-time or date or time fields. I am reading it via xlrd
package and I am getting time as milliseconds
I suppose and when I am trying to convert it back to datetime I am getting wrong results.
I tried with converting the file to csv
as well. That too doesn't help and I get weird datetime format which I am not able to comprehend.
Here is what I tried with the xlrd
format. I prefer to use files with .xlrs
extension as input because otherwise I have to convert excel files to .csv
every time I get a new input file.
from xlrd import open_workbook
import os,pickle,datetime
def main(path, filename, absolute_path_organisation_structure):
absolute_filepath = os.path.join(path,filename)
wb = open_workbook(absolute_filepath)
for sheet in wb.sheets():
number_of_rows = sheet.nrows
number_of_columns = sheet.ncols
for row_index in xrange(1, sheet.nrows):
row=[]
for col_index in xrange(4,7): #4th and 6th columns are date fields
row.append(sheet.cell(row_index, col_index).value)
print(row) #Relevant list formed with 4th, 5th and 6th columns
print(datetime.datetime.fromtimestamp(float(row[0])).strftime('%Y-%m-%d %H:%M:%S'))
path = "C:\\Users\\***************\\NEW DATA"
MISfile = "P2P_2015 - Copy.xlsx"
absolute_path_organisation_structure = "C:\\Users\\******************NEW DATA\\organisation.csv"
main(path, MISfile, absolute_path_organisation_structure)
Result:
[42011.46789351852, u'Registered', 42009.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG1 approval', 42010.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent back', 42010.0]
1970-01-01 17:10:11
[42011.46789351852, u'Registered', 42011.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG1 approval', 42011.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG2 approval', 42012.0]
1970-01-01 17:10:11
[42011.46789351852, u'CTG2 Approved', 42012.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent back', 42013.0]
1970-01-01 17:10:11
[42170.61667824074, u'Registered', 42144.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42144.0]
1970-01-01 17:12:50
[42170.61667824074, u'Sent back', 42165.0]
1970-01-01 17:12:50
[42170.61667824074, u'Sent back', 42165.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42170.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42170.0]
1970-01-01 17:12:50
Actual input file: (copied from excel)
1/7/2015 11:13 Registered 1/5/2015 0:00
1/7/2015 11:13 Sent for CTG1 approval 1/6/2015 0:00
1/7/2015 11:13 Sent back 1/6/2015 0:00
1/7/2015 11:13 Registered 1/7/2015 0:00
1/7/2015 11:13 Sent for CTG1 approval 1/7/2015 0:00
1/7/2015 11:13 Sent for CTG2 approval 1/8/2015 0:00
1/7/2015 11:13 CTG2 Approved 1/8/2015 0:00
1/7/2015 11:13 Sent back 1/9/2015 0:00
6/15/2015 14:48 Registered 5/20/2015 0:00
6/15/2015 14:48 Registered 5/20/2015 0:00
6/15/2015 14:48 Sent back 6/10/2015 0:00
6/15/2015 14:48 Sent back 6/10/2015 0:00
6/15/2015 14:48 Registered 6/15/2015 0:00
6/15/2015 14:48 Registered 6/15/2015 0:00
Why am I not able to read dates correctly? Why are they not simply coming up as strings so that I can easily convert them?
Upvotes: 5
Views: 4791
Reputation: 719
If the Excel file to read is a table can be simple and straightforward to use pandas.read_excel. After transforming dates with pandas.to_datetime
from __future__ import absolute_import, division, print_function
import os
import pandas as pd
def main(path, filename, absolute_path_organisation_structure):
absolute_filepath = os.path.join(path,filename)
#Relevant list formed with 4th, 5th and 6th columns
df = pd.read_excel(absolute_filepath, header=None, parse_cols=[4,5,6])
# Transform column 0 and 2 to datetime
df[0] = pd.to_datetime(df[0])
df[2] = pd.to_datetime(df[2])
print(df)
path = "C:\\Users\\***************\\NEW DATA"
MISfile = "P2P_2015 - Copy.xlsx"
main(path, MISfile,None)
Upvotes: 1
Reputation: 5090
xldate_as_tuple(xldate, datemode) [#]
Convert an Excel number (presumed to represent a date, a datetime or a time) into a tuple suitable for feeding to datetime or mx.DateTime constructors.
Source: http://www.lexicon.net/sjmachin/xlrd.html#xlrd.xldate_as_tuple-function
Usage Example: How to use ``xlrd.xldate_as_tuple()``
Upvotes: 2
Reputation: 15545
The issue is that you are interpreting the Excel datetime value as a UNIX timestamp, when they are not the same thing. The warning flag to look for is that the resulting values are all near the UNIX epoch (1970-01-01
).
You can convert from an Excel datetime to UNIX by using the method described in this answer.
Unix Timestamp = (Excel Timestamp - 25569) * 86400
Unix Timestamp = (Excel Timestamp - 24107) * 86400
If you apply this conversion you should get the correct output:
timestamp = (float(row[0]) - 25569) * 86400
datetime.datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')
Upvotes: 2