Shivendra
Shivendra

Reputation: 1096

Python - not getting correct datetime upon reading time from excel file

I have an excel file which has 3 columns as either date-time or date or time fields. I am reading it via xlrd package and I am getting time as milliseconds I suppose and when I am trying to convert it back to datetime I am getting wrong results.

I tried with converting the file to csv as well. That too doesn't help and I get weird datetime format which I am not able to comprehend.

Here is what I tried with the xlrd format. I prefer to use files with .xlrs extension as input because otherwise I have to convert excel files to .csv every time I get a new input file.

from xlrd import open_workbook
import os,pickle,datetime

def main(path, filename, absolute_path_organisation_structure):
    absolute_filepath = os.path.join(path,filename)

    wb = open_workbook(absolute_filepath)
    for sheet in wb.sheets():
        number_of_rows = sheet.nrows
        number_of_columns = sheet.ncols

        for row_index in xrange(1, sheet.nrows):
            row=[]
            for col_index in xrange(4,7): #4th and 6th columns are date fields
                row.append(sheet.cell(row_index, col_index).value)

            print(row)  #Relevant list formed with 4th, 5th and 6th columns
            print(datetime.datetime.fromtimestamp(float(row[0])).strftime('%Y-%m-%d %H:%M:%S'))


path = "C:\\Users\\***************\\NEW DATA"
MISfile  = "P2P_2015 - Copy.xlsx"
absolute_path_organisation_structure = "C:\\Users\\******************NEW DATA\\organisation.csv"
main(path, MISfile, absolute_path_organisation_structure)

Result:

[42011.46789351852, u'Registered', 42009.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG1 approval', 42010.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent back', 42010.0]
1970-01-01 17:10:11
[42011.46789351852, u'Registered', 42011.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG1 approval', 42011.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG2 approval', 42012.0]
1970-01-01 17:10:11
[42011.46789351852, u'CTG2 Approved', 42012.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent back', 42013.0]
1970-01-01 17:10:11
[42170.61667824074, u'Registered', 42144.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42144.0]
1970-01-01 17:12:50
[42170.61667824074, u'Sent back', 42165.0]
1970-01-01 17:12:50
[42170.61667824074, u'Sent back', 42165.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42170.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42170.0]
1970-01-01 17:12:50

Actual input file: (copied from excel)

1/7/2015 11:13  Registered  1/5/2015 0:00
1/7/2015 11:13  Sent for CTG1 approval  1/6/2015 0:00
1/7/2015 11:13  Sent back   1/6/2015 0:00
1/7/2015 11:13  Registered  1/7/2015 0:00
1/7/2015 11:13  Sent for CTG1 approval  1/7/2015 0:00
1/7/2015 11:13  Sent for CTG2 approval  1/8/2015 0:00
1/7/2015 11:13  CTG2 Approved   1/8/2015 0:00
1/7/2015 11:13  Sent back   1/9/2015 0:00
6/15/2015 14:48 Registered  5/20/2015 0:00
6/15/2015 14:48 Registered  5/20/2015 0:00
6/15/2015 14:48 Sent back   6/10/2015 0:00
6/15/2015 14:48 Sent back   6/10/2015 0:00
6/15/2015 14:48 Registered  6/15/2015 0:00
6/15/2015 14:48 Registered  6/15/2015 0:00

Why am I not able to read dates correctly? Why are they not simply coming up as strings so that I can easily convert them?

Upvotes: 5

Views: 4791

Answers (3)

jrovegno
jrovegno

Reputation: 719

If the Excel file to read is a table can be simple and straightforward to use pandas.read_excel. After transforming dates with pandas.to_datetime

from __future__ import absolute_import, division, print_function
import os
import pandas as pd

def main(path, filename, absolute_path_organisation_structure):
    absolute_filepath = os.path.join(path,filename)
    #Relevant list formed with 4th, 5th and 6th columns
    df = pd.read_excel(absolute_filepath, header=None, parse_cols=[4,5,6])
    # Transform column 0 and 2 to datetime
    df[0] = pd.to_datetime(df[0])
    df[2] = pd.to_datetime(df[2])
    print(df)

path = "C:\\Users\\***************\\NEW DATA"
MISfile  = "P2P_2015 - Copy.xlsx"
main(path, MISfile,None)

Upvotes: 1

khajvah
khajvah

Reputation: 5090

xldate_as_tuple(xldate, datemode) [#]

Convert an Excel number (presumed to represent a date, a datetime or a time) into a tuple suitable for feeding to datetime or mx.DateTime constructors.

Source: http://www.lexicon.net/sjmachin/xlrd.html#xlrd.xldate_as_tuple-function

Usage Example: How to use ``xlrd.xldate_as_tuple()``

Upvotes: 2

mfitzp
mfitzp

Reputation: 15545

The issue is that you are interpreting the Excel datetime value as a UNIX timestamp, when they are not the same thing. The warning flag to look for is that the resulting values are all near the UNIX epoch (1970-01-01).

You can convert from an Excel datetime to UNIX by using the method described in this answer.

Windows/Mac Excel 2011

Unix Timestamp = (Excel Timestamp - 25569) * 86400

Mac Excel 2007

Unix Timestamp = (Excel Timestamp - 24107) * 86400

If you apply this conversion you should get the correct output:

timestamp = (float(row[0]) - 25569) * 86400
datetime.datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')

Upvotes: 2

Related Questions