Nikit Parakh
Nikit Parakh

Reputation: 75

Pandas DataFrame - mixed date formatting

I am trying to analyse some covid data I pulled from a csv file available online - https://api.covid19india.org/csv/latest/tested_numbers_icmr_data.csv

I trimmed it down to only use the Tested As Of and Total Samples Tested columns.

df = pd.read_csv("tested_numbers_icmr_data.csv")
df = tested[['Total Samples Tested', 'Tested As Of']]

Then renamed the date column, converted to datetime and set it as my index.

df = df.rename(columns={'Tested As Of':'Date'})
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')

I printed the dataframe to check if everything was as intended and noticed that some of the date formatting was all mixed up. Here is a snippet:

2020-10-04              161330.0
2020-11-04              179374.0
2020-12-04              195748.0
2020-04-13              217554.0
2020-04-14              244893.0
2020-04-15              274599.0
2020-04-16              302956.0

These kinds of inconsistencies are all over the dataframe. Is there a solution to this? Because of this mixed formatting some dates seem to be missing when in reality the formatting is plain wrong.

UPDATE: I checked the file manually and the entries are all correctly formatted and consistent.

Upvotes: 0

Views: 641

Answers (2)

robbo
robbo

Reputation: 545

If the formatting of the dates in the original file are consistent, you can supply the "format" argument in the pd.to_datetime() function. It follows the same formatting rules as Python's datetime module: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes

So in your case might be pd.to_datetime(df['Date'], format="%Y-%m-%d").

Upvotes: 2

Nikit Parakh
Nikit Parakh

Reputation: 75

Turns out, I was just missing a format statement

df['Date'] = pd.to_datetime(df['Date'], format="%d/%m/%Y")

Upvotes: 0

Related Questions