Reputation: 51
I have a dataset saved in an xls file.
In this dataset there are 4 columns that represent dates, in the format dd/mm/yyyy.
My problem is that when I read it in python using pandas and the function read_excel all the columns are read as string, except one, read as datetime64[ns], also if I specify dtypes={column=str}. Why?
Upvotes: 0
Views: 49
Reputation: 186
Dates in Excel are frequently stored as numbers, which allows you to do things like subtract them, even though they might be displayed as human-readable dates like dd/mm/yyyy. Pandas is handily taking those numbers and interpreting them as dates, which lets you deal with them more flexibly.
To turn them into strings, you can use the converters
argument of pd.read_excel
like so:
df = pd.read_excel(filename, converters={'name_of_date_column': lambda dt: dt.strftime('%d/%m/%Y')})
The strftime method lets you format dates however you like. Specifying a converter for your column lets you apply the function to the data as you read it in.
Upvotes: 1