Reputation: 75
I am trying to analyse some covid data I pulled from a csv file available online - https://api.covid19india.org/csv/latest/tested_numbers_icmr_data.csv
I trimmed it down to only use the Tested As Of and Total Samples Tested columns.
df = pd.read_csv("tested_numbers_icmr_data.csv")
df = tested[['Total Samples Tested', 'Tested As Of']]
Then renamed the date column, converted to datetime and set it as my index.
df = df.rename(columns={'Tested As Of':'Date'})
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
I printed the dataframe to check if everything was as intended and noticed that some of the date formatting was all mixed up. Here is a snippet:
2020-10-04 161330.0
2020-11-04 179374.0
2020-12-04 195748.0
2020-04-13 217554.0
2020-04-14 244893.0
2020-04-15 274599.0
2020-04-16 302956.0
These kinds of inconsistencies are all over the dataframe. Is there a solution to this? Because of this mixed formatting some dates seem to be missing when in reality the formatting is plain wrong.
UPDATE: I checked the file manually and the entries are all correctly formatted and consistent.
Upvotes: 0
Views: 641
Reputation: 545
If the formatting of the dates in the original file are consistent, you can supply the "format" argument in the pd.to_datetime() function. It follows the same formatting rules as Python's datetime module: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
So in your case might be pd.to_datetime(df['Date'], format="%Y-%m-%d").
Upvotes: 2
Reputation: 75
Turns out, I was just missing a format statement
df['Date'] = pd.to_datetime(df['Date'], format="%d/%m/%Y")
Upvotes: 0