Tushar
Tushar

Reputation: 35

Panda read_excel() gives incorrect date value

I have abc.xlsb file having column name "Reporting Date". It is having values "16-Aug-21". I am trying to convert abc.xlsb to a CSV file. But I am getting random value for date "44424"

import pandas as pd 
xls = pd.ExcelFile('abc.xlsb') 
df = pd.read_excel(xls, 'xyz_sheetname') 
print(df['Reporting Date'])

Output :

44424

Expected output :

16-Aug-21

Please help. Thanks in advance.

Upvotes: 0

Views: 843

Answers (1)

Emmanuel Murairi
Emmanuel Murairi

Reputation: 401

44424 represents the number of days passed since 12/31/1899

For example:

1 = 31 Dec 1899
2 = 1 Jan 1900
3 = 2 Jan 1900

So, 44424 would be exactly 16 Aug 2021 as expected.

You can use this logic on pandas to get the actual date before exporting into csv.

You might want to check this: https://www.ablebits.com/office-addins-blog/2019/08/13/google-sheets-change-date-format/

This should work:

from datetime import datetime, timedelta
import pandas as pd

df = pd.DataFrame({"date": [1,2,3,4, 44424]})
df['date'] = df['date'].apply(lambda x: timedelta(x) + datetime(1899,12,30))

print(df)

Upvotes: 2

Related Questions