Reputation: 31
Pandas error when reading date from excel file. I am creating a dataframe using the following command.
df = pd.read_excel("report_file.xls", parse_dates=['operation_date'])
df.dtypes
operation_date datetime64[ns]
Everything looks good. But when analyzing the dataframe, an error was found. After the number of the day matches the number of the month, the pandas is mistaken and reverses the day and month. For example, in October data it looks like this.
45 2021-10-13 11:50:34 ... 329.97
46 2021-10-13 11:41:56 ... 323.50
47 2021-10-13 11:41:55 ... 2600.00
48 2021-10-10 02:05:13 ... 1479.45
49 2021-09-10 20:22:01 ... 40.00
50 2021-09-10 19:39:39 ... 42.64
51 2021-09-10 19:39:39 ... 350.00
52 2021-06-10 20:11:48 ... 20.00
53 2021-06-10 13:34:25 ... 1.96
You can see that after 2021-10-10 day number at the place of month.
Upvotes: 2
Views: 6935
Reputation: 31
I used another way. In my Excel file (report_file.xls
) I have a column with name operation_date
and dates in my table with date d-m-Y.
At first I made a dataframe from file:
df = pd.read_excel('report_file.xls')
df.dtypes # operation_date object
Then I converted the date from a string to a datetime:
df['operation_date'] = pd.to_datetime(df['operation_date'],dayfirst=True)
df.dtypes # operation_date datetime64[ns]
In this case, there were no problems in the date after 2021-10-10 that I described. Here is the code:
df = pd.read_excel("report_file.xls", parse_dates=['operation_date'])
45 2021-10-13 11:50:34 ... 329.97
46 2021-10-13 11:41:56 ... 323.50
47 2021-10-13 11:41:55 ... 2600.00
48 2021-10-10 02:05:13 ... 1479.45
49 2021-10-09 20:22:01 ... 40.00
50 2021-10-09 19:39:39 ... 42.64
51 2021-10-09 19:39:39 ... 350.00
52 2021-10-06 20:11:48 ... 20.00
53 2021-10-06 13:34:25 ... 1.96
And now the date looks correct.
Upvotes: 0
Reputation: 11346
Try passing the date format explicitly, something like this:
pd.read_excel(
"report_file.xls",
parse_dates=['operation_date'],
date_parser=lambda x: pd.to_datetime(x, format='%Y-%m-%d %I:%M:%S')
)
Upvotes: 1