Reputation: 85
jan_21=[datetime(2021,1,1) + timedelta(hours=i) for i in range(5)]
jan_21
[datetime.datetime(2021, 1, 1, 0, 0),
datetime.datetime(2021, 1, 1, 1, 0),
datetime.datetime(2021, 1, 1, 2, 0),
datetime.datetime(2021, 1, 1, 3, 0),
datetime.datetime(2021, 1, 1, 4, 0)]
prices = np.random.randint(1,100,size=(5,))
prices
[46 23 13 26 52]
df = pd.DataFrame({'datetime':jan_21, 'price':prices})
df
datetime price
0 2021-01-01 00:00:00 83
1 2021-01-01 01:00:00 60
2 2021-01-01 02:00:00 29
3 2021-01-01 03:00:00 97
4 2021-01-01 04:00:00 67
All good so far, this is how I expected the dataframe and datetime values to be displayed. The problem comes when I save the dataframe to an excel file and then read it back into a dataframe, the datetime values get messed up.
df.to_excel('price_data.xlsx', index=False)
new_df = pd.read_excel('price_data.xlsx')
new_df
datetime price
0 2021-01-01 00:00:00.000000 83
1 2021-01-01 00:59:59.999999 60
2 2021-01-01 02:00:00.000001 29
3 2021-01-01 03:00:00.000000 97
4 2021-01-01 03:59:59.999999 67
I'd like df == new_df
to evaluate to True
Upvotes: 2
Views: 1087
Reputation: 763
Against the backdrop of the likely cause of the issue (see sophros' answer), what you could do to - superficiously - circumvent the problem is converting the cells of df["datetime"]
to strings before producing the excel file and then converting the strings to datetime again, after new_df
has been created:
df["datetime"] = df["datetime"].dt.strftime("%m/%d/%Y, %H:%M:%S")
df.to_excel('price_data.xlsx', index=False)
new_df = pd.read_excel('price_data.xlsx')
new_df["datetime"] = pd.to_datetime(new_df["datetime"], format="%m/%d/%Y, %H:%M:%S")
Upvotes: 2
Reputation: 16660
The reason for the difference in time part of 00:59:59.999999
and 02:00:00.000001
and 03:59:59.999999
is most likely related to a slightly different binary representation of date/time types in Excel and Python or pandas.
The time is frequently stored as a float but the difference is when is the 0-th time (e.g. year 1 AC or 1970 - as in Linux; good explanation here). Therefore, the conversion may loose some least significant parts of the date/time and there is not much you can do about it but round it up or using approximate comparisons as with any float.
Upvotes: 0