Reputation: 61
I am using a PostgreSQL database to store and retrieve data. The columns I get are:
This is how I get my data using pd.read_sql_query
:
df = pd.read_sql_query("SELECT date, value FROM my_db WHERE code ='%s' and type = '%s' and date >= '%s' and date <= '%s' ORDER BY date;"
% (code, type, start_date, end_date), con=Cdc.engine, index_col="date")
df["value"].replace(to_replace=0, value=np.nan, inplace=True) # replace 0 by NaN since 0 is an error in my data
Then I try to do the following, and this is where I get my error:
df = df[df.index.month.isin(wanted_month)]
Which returns me the error:
AttributeError: 'Index' object has no attribute 'month'
1. I tried to fix this by doing:
self.df_cdc.index = pd.to_datetime(self.df_cdc.index)
But it does not work since I get this error:
ValueError: Array must be all same time zone
During handling of the above exception, another exception occurred:
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
2. I also tried to do:
df = df[df.index.dt.month.isin(wanted_month)]
But I get this error:
AttributeError: 'Index' object has no attribute 'dt'
Here is an example of the index values of my DF:
# 2019-12-30 21:10:00+01:00 --> <class 'datetime.datetime'>
# 2019-10-11 15:00:00+02:00 --> <class 'datetime.datetime'>
Those are french data, so we have daylight saving time with +1 hour in winter and +2 in summer compared to UTC (e.g. : winter: UTC = 10h00 --> France = 11h00; summer: UTC = 10h00 --> France = 12h00
).
Displaying tzinfo gives me this :
print(df.index[0], "-->", df.index[0].tzinfo)
print(df.index[15000], "-->", df.index[15000].tzinfo
2019-01-01 00:00:00+01:00 --> psycopg2.tz.FixedOffsetTimezone(offset=60, name=None)
2019-04-15 05:00:00+02:00 --> psycopg2.tz.FixedOffsetTimezone(offset=120, name=None)
Thank you for your attention.
EDIT: I found the solution and posted it below.
Upvotes: 1
Views: 7681
Reputation: 61
I fixed the problem by converting first to UTC, and then back to my timezone (I verified and the dates and times are correct:
df.index = pd.to_datetime(self.df_cdc.index, utc=True)
df.index = self.df_cdc.index.tz_convert("Europe/Paris")
This is the complete query:
df = pd.read_sql_query("SELECT date, value FROM my_db WHERE code ='%s' and type = '%s' and date >= '%s' and date <= '%s' ORDER BY date;"
% (code, type, start_date, end_date), con=Cdc.engine, index_col="date")
df["value"].replace(to_replace=0, value=np.nan, inplace=True) # replace 0 by NaN since 0 is an error in my data
df.index = pd.to_datetime(self.df_cdc.index, utc=True)
df.index = self.df_cdc.index.tz_convert("Europe/Paris")
Upvotes: 5