Reputation: 1970
The solution to the question asked here unfortunately does not solve this problem. I'm using Python 3.6.2
The Dataframe, df
:
date1 date2
rec0 2017-05-25 14:02:23+00:00 2017-05-25 14:34:43+00:00
rec1 NaT 2017-05-16 19:37:43+00:00
To reproduce the problem:
import psycopg2
import pandas as pd
Timestamp = pd.Timestamp
NaT = pd.NaT
df = pd.DataFrame({'date1': [Timestamp('2017-05-25 14:02:23'), NaT],
'date2': [Timestamp('2017-05-25 14:34:43'), Timestamp('2017-05-16 19:37:43')]})
tz = psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)
for col in ['date1', 'date2']:
df[col] = pd.DatetimeIndex(df[col]).tz_localize(tz)
print(df.max(axis=1))
Both of the above columns have been converted using pd.to_datetime()
to get the following column type: datetime64[ns, psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)]
Running df.max(axis=1)
doesn't give an error but certainly provides the incorrect solution.
Output (incorrect):
rec0 NaN
rec1 NaN
dtype: float64
The fix that I have in place is to apply
a custom function to the df as written below:
def get_max(x):
test = x.dropna()
return max(test)
df.apply(get_max,axis=1)
Output (correct):
rec0 2017-05-25 14:34:43+00:00
rec1 2017-05-16 19:37:43+00:00
dtype: datetime64[ns, psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)]
Maybe df.max()
doesn't deal with date objects but only looks for floats (docs). Any idea why df.max(axis=1)
only returns NaN
?
Upvotes: 2
Views: 2752
Reputation: 205
Using Pandas 1.0.5 with Python 3.8 I was still getting a series of Nans. Solved the issue by converting both columns to datetime and then adding skipna=True and numeric_only=False to the max() function:
df['1'] = pd.to_datetime(df['1'], utc=True)
df['2'] = pd.to_datetime(df['2'], utc=True)
df['3'] = df[['1', '2']].max(axis=1, skipna=True, numeric_only=False)
Upvotes: 0
Reputation: 3138
After some testing, it looks like there is something wrong with pandas
and psycopg2.tz.FixedOffsetTimezone
.
If you try df.max(axis=0)
it will work as expected, but as you indicate df.max(axis=1)
will return a series of NaN
.
If you do not use psycopg2.tz.FixedOffsetTimezone
as tz
, df.max(axis=1)
will return the expected result.
Other manipulations will fail in this case, such as df.transpose
.
Note that if you try df.values.max(axis=1)
, you will get the expected result. So numpy.array
seems to be able to deal with this.
You should search in pandas
Github issues (like this one) and maybe consider opening a new one if you can't find a fix.
Another solution would be to drop psycopg2.tz.FixedOffsetTimezone
, but you may have some reason to use this specifically.
Upvotes: 1