eagerstudent
eagerstudent

Reputation: 277

Pandas' to_datetime function doesn't change dtype

I have been recently working with python and I have found an issue I can't seem to solve. I am using a pandas dataset and when I want to change the dtype of a variable from 'object' to 'datetime64' using the to_datetime function, it does not changes it to the desired 'datetime64' dtype.

So far I only tried the to_datetime function, but that can't seem to solve the problem. I am looking for a solution to make to_datetime work or any other code that could change my variable's dtype from 'object' to 'datetime64'

Here you can find info about the dataset:

df.head()
Formatted Date                      Summary  Precip Type Temperature (C)   Apparent Temperature (C)   Humidity   Wind Speed (km/h)   Wind Bearing (degrees)  Visibility (km)  Loud Cover Pressure (millibars)   Daily Summary
0   2006-04-01 00:00:00.000 +0200   Partly Cloudy   rain    9.472222    7.388889    0.89    14.1197     251.0   15.8263     0.0     1015.13     Partly cloudy throughout the day.
1   2006-04-01 01:00:00.000 +0200   Partly Cloudy   rain    9.355556    7.227778    0.86    14.2646     259.0   15.8263     0.0     1015.63     Partly cloudy throughout the day.
2   2006-04-01 02:00:00.000 +0200   Mostly Cloudy   rain    9.377778    9.377778    0.89    3.9284  204.0   14.9569     0.0     1015.94     Partly cloudy throughout the day.
3   2006-04-01 03:00:00.000 +0200   Partly Cloudy   rain    8.288889    5.944444    0.83    14.1036     269.0   15.8263     0.0     1016.41     Partly cloudy throughout the day.
4   2006-04-01 04:00:00.000 +0200   Mostly Cloudy   rain    8.755556    6.977778    0.83    11.0446     259.0   15.8263     0.0     1016.51     Partly cloudy throughout the day.

And here you can see the dtypes before using the to_datetime function:

df.dtypes
Formatted Date               object
Summary                      object
Precip Type                  object
Temperature (C)             float64
Apparent Temperature (C)    float64
Humidity                    float64
Wind Speed (km/h)           float64
Wind Bearing (degrees)      float64
Visibility (km)             float64
Loud Cover                  float64
Pressure (millibars)        float64
Daily Summary                object
dtype: object

Here after using the to_datetime function:

df['Date'] = pd.to_datetime(df['Formatted Date'])
df.dtypes

Formatted Date               object
Summary                      object
Precip Type                  object
Temperature (C)             float64
Apparent Temperature (C)    float64
Humidity                    float64
Wind Speed (km/h)           float64
Wind Bearing (degrees)      float64
Visibility (km)             float64
Loud Cover                  float64
Pressure (millibars)        float64
Daily Summary                object
Date                         object
dtype: object

Can you tell me what I'm doing wrong? Thanks in advance!

Upvotes: 3

Views: 5954

Answers (3)

yascho
yascho

Reputation: 314

For pandas>=0.24 you need to add the parameter utc=True.

import pandas as pd

# load dataset
df = pd.read_csv('weatherHistory.csv')

df['Date'] = df['Formatted Date'].apply(pd.to_datetime, utc=True)
df.dtypes
Formatted Date                           object
Summary                                  object
Precip Type                              object
Temperature (C)                         float64
Apparent Temperature (C)                float64
Humidity                                float64
Wind Speed (km/h)                       float64
Wind Bearing (degrees)                  float64
Visibility (km)                         float64
Loud Cover                              float64
Pressure (millibars)                    float64
Daily Summary                            object
Date                        datetime64[ns, UTC]

Upvotes: 6

powerPixie
powerPixie

Reputation: 708

I've been in trouble with pandas and acessing elements by column labels. I made a reduced version of your dataframe and could change the column dataype using the location of the column by index.

Try to change your:

 pd.to_datetime(df['Formatted Date'])

to:

  pd.to_datetime(df.iloc[0])

It worked for me:

  data=['2006-04-01 00:00:00.000 +0200']

  df = pd.DataFrame(data)

  df2 = pd.to_datetime(df.iloc[0])

  print(df2.dtypes)

The output was:

  datetime64[ns, pytz.FixedOffset(120)]

I downloaded the same data you are using and I think it may be a possible solution for your dataset, just extending your original code to deal with date format:

  df['Date'] = pd.to_datetime(df['Formatted Date'], format = '%Y-%m-%d %H:%M:%S.%f %p', errors= 'coerce')

As you can see the column "Date" has the correct datatype now:

Formatted Date                      object
Summary                             object
Precip Type                         object
Temperature (C)                    float64
Apparent Temperature (C)           float64
Humidity                           float64
Wind Speed (km/h)                  float64
Wind Bearing (degrees)             float64
Visibility (km)                    float64
Loud Cover                         float64
Pressure (millibars)               float64
Daily Summary                       object
Date                        datetime64[ns]

Upvotes: 2

yascho
yascho

Reputation: 314

Problem

You want to change the dtype value from object to datetime64.

df = pd.DataFrame(data={'col':["2006-04-01 00:00:00.000 +0200"]})
df.dtypes

Output:

col    object
dtype: object

Solution

To change the type, you need to apply pd.to_datetime.

df['col'] = df['col'].apply(pd.to_datetime)
df.dtypes

Output:

col    datetime64[ns, pytz.FixedOffset(120)]
dtype: object

If this does not work, then your column Formatted Date might contain inconsistent date formats or NaN values.

Real data

Using your dataset (https://www.kaggle.com/budincsevity/szeged-weather/):

import pandas as pd

# load dataset
df = pd.read_csv('weatherHistory.csv')
df.dtypes
Formatted Date               object
Summary                      object
Precip Type                  object
Temperature (C)             float64
Apparent Temperature (C)    float64
Humidity                    float64
Wind Speed (km/h)           float64
Wind Bearing (degrees)      float64
Visibility (km)             float64
Loud Cover                  float64
Pressure (millibars)        float64
Daily Summary                object
dtype: object
df['Date'] = df['Formatted Date'].apply(pd.to_datetime)
df.dtypes
Formatted Date                      object
Summary                             object
Precip Type                         object
Temperature (C)                    float64
Apparent Temperature (C)           float64
Humidity                           float64
Wind Speed (km/h)                  float64
Wind Bearing (degrees)             float64
Visibility (km)                    float64
Loud Cover                         float64
Pressure (millibars)               float64
Daily Summary                       object
Date                        datetime64[ns]
dtype: object

Upvotes: 2

Related Questions