Tuturuu
Tuturuu

Reputation: 61

Convert Tz-aware datetime.datetime to datetime64/DatetimeIndex/Timestamp

I am using a PostgreSQL database to store and retrieve data. The columns I get are:

  1. date : timestamp with timezone
  2. value: double precision
  3. type : character varying (approximate geographical location of the data)
  4. code: character varying (code related to the exact geographical location)

This is how I get my data using pd.read_sql_query:

df     = pd.read_sql_query("SELECT date, value FROM my_db WHERE code ='%s' and type = '%s' and date >= '%s' and date <= '%s' ORDER BY date;"
                            % (code, type, start_date, end_date), con=Cdc.engine, index_col="date")
df["value"].replace(to_replace=0, value=np.nan, inplace=True)  # replace 0 by NaN since 0 is an error in my data

Then I try to do the following, and this is where I get my error:

df = df[df.index.month.isin(wanted_month)]

Which returns me the error:

AttributeError: 'Index' object has no attribute 'month'

1. I tried to fix this by doing:

self.df_cdc.index = pd.to_datetime(self.df_cdc.index)

But it does not work since I get this error:

ValueError: Array must be all same time zone
During handling of the above exception, another exception occurred:
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

2. I also tried to do:

df = df[df.index.dt.month.isin(wanted_month)]

But I get this error:

AttributeError: 'Index' object has no attribute 'dt'

Here is an example of the index values of my DF:

#   2019-12-30 21:10:00+01:00 --> <class 'datetime.datetime'>
#   2019-10-11 15:00:00+02:00 --> <class 'datetime.datetime'>

Those are french data, so we have daylight saving time with +1 hour in winter and +2 in summer compared to UTC (e.g. : winter: UTC = 10h00 --> France = 11h00; summer: UTC = 10h00 --> France = 12h00).

Displaying tzinfo gives me this :

print(df.index[0], "-->", df.index[0].tzinfo)
print(df.index[15000], "-->", df.index[15000].tzinfo
2019-01-01 00:00:00+01:00 --> psycopg2.tz.FixedOffsetTimezone(offset=60, name=None)
2019-04-15 05:00:00+02:00 --> psycopg2.tz.FixedOffsetTimezone(offset=120, name=None)

Thank you for your attention.

EDIT: I found the solution and posted it below.

Upvotes: 1

Views: 7681

Answers (1)

Tuturuu
Tuturuu

Reputation: 61

I fixed the problem by converting first to UTC, and then back to my timezone (I verified and the dates and times are correct:

    df.index = pd.to_datetime(self.df_cdc.index, utc=True)
    df.index = self.df_cdc.index.tz_convert("Europe/Paris")

This is the complete query:

df     = pd.read_sql_query("SELECT date, value FROM my_db WHERE code ='%s' and type = '%s' and date >= '%s' and date <= '%s' ORDER BY date;"
                                % (code, type, start_date, end_date), con=Cdc.engine, index_col="date")
df["value"].replace(to_replace=0, value=np.nan, inplace=True)  # replace 0 by NaN since 0 is an error in my data
df.index = pd.to_datetime(self.df_cdc.index, utc=True)
df.index = self.df_cdc.index.tz_convert("Europe/Paris")

Upvotes: 5

Related Questions