pavlinbl4
pavlinbl4

Reputation: 31

Pandas mistake when reading date from excel file

Pandas error when reading date from excel file. I am creating a dataframe using the following command.

df = pd.read_excel("report_file.xls", parse_dates=['operation_date'])
df.dtypes
operation_date  datetime64[ns]

Everything looks good. But when analyzing the dataframe, an error was found. After the number of the day matches the number of the month, the pandas is mistaken and reverses the day and month. For example, in October data it looks like this.

45 2021-10-13 11:50:34  ...                       329.97
46 2021-10-13 11:41:56  ...                       323.50
47 2021-10-13 11:41:55  ...                      2600.00
48 2021-10-10 02:05:13  ...                      1479.45
49 2021-09-10 20:22:01  ...                        40.00
50 2021-09-10 19:39:39  ...                        42.64
51 2021-09-10 19:39:39  ...                       350.00
52 2021-06-10 20:11:48  ...                        20.00
53 2021-06-10 13:34:25  ...                         1.96

You can see that after 2021-10-10 day number at the place of month.

Upvotes: 2

Views: 6935

Answers (2)

pavlinbl4
pavlinbl4

Reputation: 31

I used another way. In my Excel file (report_file.xls) I have a column with name operation_date and dates in my table with date d-m-Y.

my table

At first I made a dataframe from file:

df = pd.read_excel('report_file.xls')
df.dtypes   # operation_date   object

Then I converted the date from a string to a datetime:

df['operation_date'] = pd.to_datetime(df['operation_date'],dayfirst=True)
df.dtypes   #  operation_date      datetime64[ns]

In this case, there were no problems in the date after 2021-10-10 that I described. Here is the code:

df = pd.read_excel("report_file.xls", parse_dates=['operation_date'])
45 2021-10-13 11:50:34  ...                       329.97
46 2021-10-13 11:41:56  ...                       323.50
47 2021-10-13 11:41:55  ...                      2600.00
48 2021-10-10 02:05:13  ...                      1479.45
49 2021-10-09 20:22:01  ...                        40.00
50 2021-10-09 19:39:39  ...                        42.64
51 2021-10-09 19:39:39  ...                       350.00
52 2021-10-06 20:11:48  ...                        20.00
53 2021-10-06 13:34:25  ...                         1.96

And now the date looks correct.

Upvotes: 0

funnydman
funnydman

Reputation: 11346

Try passing the date format explicitly, something like this:

pd.read_excel(
    "report_file.xls",
    parse_dates=['operation_date'],
    date_parser=lambda x: pd.to_datetime(x, format='%Y-%m-%d %I:%M:%S')
)

Upvotes: 1

Related Questions