Reputation: 277
I have been recently working with python and I have found an issue I can't seem to solve. I am using a pandas dataset and when I want to change the dtype of a variable from 'object' to 'datetime64' using the to_datetime function, it does not changes it to the desired 'datetime64' dtype.
So far I only tried the to_datetime function, but that can't seem to solve the problem. I am looking for a solution to make to_datetime work or any other code that could change my variable's dtype from 'object' to 'datetime64'
Here you can find info about the dataset:
df.head()
Formatted Date Summary Precip Type Temperature (C) Apparent Temperature (C) Humidity Wind Speed (km/h) Wind Bearing (degrees) Visibility (km) Loud Cover Pressure (millibars) Daily Summary
0 2006-04-01 00:00:00.000 +0200 Partly Cloudy rain 9.472222 7.388889 0.89 14.1197 251.0 15.8263 0.0 1015.13 Partly cloudy throughout the day.
1 2006-04-01 01:00:00.000 +0200 Partly Cloudy rain 9.355556 7.227778 0.86 14.2646 259.0 15.8263 0.0 1015.63 Partly cloudy throughout the day.
2 2006-04-01 02:00:00.000 +0200 Mostly Cloudy rain 9.377778 9.377778 0.89 3.9284 204.0 14.9569 0.0 1015.94 Partly cloudy throughout the day.
3 2006-04-01 03:00:00.000 +0200 Partly Cloudy rain 8.288889 5.944444 0.83 14.1036 269.0 15.8263 0.0 1016.41 Partly cloudy throughout the day.
4 2006-04-01 04:00:00.000 +0200 Mostly Cloudy rain 8.755556 6.977778 0.83 11.0446 259.0 15.8263 0.0 1016.51 Partly cloudy throughout the day.
And here you can see the dtypes before using the to_datetime function:
df.dtypes
Formatted Date object
Summary object
Precip Type object
Temperature (C) float64
Apparent Temperature (C) float64
Humidity float64
Wind Speed (km/h) float64
Wind Bearing (degrees) float64
Visibility (km) float64
Loud Cover float64
Pressure (millibars) float64
Daily Summary object
dtype: object
Here after using the to_datetime function:
df['Date'] = pd.to_datetime(df['Formatted Date'])
df.dtypes
Formatted Date object
Summary object
Precip Type object
Temperature (C) float64
Apparent Temperature (C) float64
Humidity float64
Wind Speed (km/h) float64
Wind Bearing (degrees) float64
Visibility (km) float64
Loud Cover float64
Pressure (millibars) float64
Daily Summary object
Date object
dtype: object
Can you tell me what I'm doing wrong? Thanks in advance!
Upvotes: 3
Views: 5954
Reputation: 314
For pandas>=0.24
you need to add the parameter utc=True
.
import pandas as pd
# load dataset
df = pd.read_csv('weatherHistory.csv')
df['Date'] = df['Formatted Date'].apply(pd.to_datetime, utc=True)
df.dtypes
Formatted Date object
Summary object
Precip Type object
Temperature (C) float64
Apparent Temperature (C) float64
Humidity float64
Wind Speed (km/h) float64
Wind Bearing (degrees) float64
Visibility (km) float64
Loud Cover float64
Pressure (millibars) float64
Daily Summary object
Date datetime64[ns, UTC]
Upvotes: 6
Reputation: 708
I've been in trouble with pandas and acessing elements by column labels. I made a reduced version of your dataframe and could change the column dataype using the location of the column by index.
Try to change your:
pd.to_datetime(df['Formatted Date'])
to:
pd.to_datetime(df.iloc[0])
It worked for me:
data=['2006-04-01 00:00:00.000 +0200']
df = pd.DataFrame(data)
df2 = pd.to_datetime(df.iloc[0])
print(df2.dtypes)
The output was:
datetime64[ns, pytz.FixedOffset(120)]
I downloaded the same data you are using and I think it may be a possible solution for your dataset, just extending your original code to deal with date format:
df['Date'] = pd.to_datetime(df['Formatted Date'], format = '%Y-%m-%d %H:%M:%S.%f %p', errors= 'coerce')
As you can see the column "Date" has the correct datatype now:
Formatted Date object
Summary object
Precip Type object
Temperature (C) float64
Apparent Temperature (C) float64
Humidity float64
Wind Speed (km/h) float64
Wind Bearing (degrees) float64
Visibility (km) float64
Loud Cover float64
Pressure (millibars) float64
Daily Summary object
Date datetime64[ns]
Upvotes: 2
Reputation: 314
You want to change the dtype
value from object
to datetime64
.
df = pd.DataFrame(data={'col':["2006-04-01 00:00:00.000 +0200"]})
df.dtypes
Output:
col object
dtype: object
To change the type, you need to apply pd.to_datetime
.
df['col'] = df['col'].apply(pd.to_datetime)
df.dtypes
Output:
col datetime64[ns, pytz.FixedOffset(120)]
dtype: object
If this does not work, then your column Formatted Date
might contain inconsistent date formats or NaN
values.
Using your dataset (https://www.kaggle.com/budincsevity/szeged-weather/):
import pandas as pd
# load dataset
df = pd.read_csv('weatherHistory.csv')
df.dtypes
Formatted Date object
Summary object
Precip Type object
Temperature (C) float64
Apparent Temperature (C) float64
Humidity float64
Wind Speed (km/h) float64
Wind Bearing (degrees) float64
Visibility (km) float64
Loud Cover float64
Pressure (millibars) float64
Daily Summary object
dtype: object
df['Date'] = df['Formatted Date'].apply(pd.to_datetime)
df.dtypes
Formatted Date object
Summary object
Precip Type object
Temperature (C) float64
Apparent Temperature (C) float64
Humidity float64
Wind Speed (km/h) float64
Wind Bearing (degrees) float64
Visibility (km) float64
Loud Cover float64
Pressure (millibars) float64
Daily Summary object
Date datetime64[ns]
dtype: object
Upvotes: 2