famargar
famargar

Reputation: 3448

Pandas find minimum of date (NOT DATETIME) column

I have a pandas dataframe df where a column Datetime is a datetime object. If I do:

df['Datetime'].min()

pandas returns the correct answer - the earliest date available. But if I want to work with date objects instead of datetimes, and create subsequently a Date that is a datetime.date object as in:

df['Date'] = df['Datetime'].dt.date
df['Date'].min()

I get back

TypeError: '<=' not supported between instances of 'float' and 'datetime.date'

Is that a pandas bug? How can I workaround it? I am using python 3.6, pandas 0.20.3

Upvotes: 0

Views: 6512

Answers (1)

jezrael
jezrael

Reputation: 862641

In pandas 0.23.0 your code working nice if no NaT values:

rng = pd.date_range('2017-04-03 14:10:01', periods=10, freq='15H')
df = pd.DataFrame({'Datetime': rng, 'a': range(10)})  

df['Date'] = df['Datetime'].dt.date
print (df['Date'].min())

2017-04-03
rng = pd.date_range('2017-04-03 14:10:01', periods=10, freq='15H')
df = pd.DataFrame({'Datetime': rng, 'a': range(10)})  
df.loc[len(df), 'Date'] = np.nan
df['Date'] = df['Datetime'].dt.date

print (df)
              Datetime    a        Date
0  2017-04-03 14:10:01  0.0  2017-04-03
1  2017-04-04 05:10:01  1.0  2017-04-04
2  2017-04-04 20:10:01  2.0  2017-04-04
3  2017-04-05 11:10:01  3.0  2017-04-05
4  2017-04-06 02:10:01  4.0  2017-04-06
5  2017-04-06 17:10:01  5.0  2017-04-06
6  2017-04-07 08:10:01  6.0  2017-04-07
7  2017-04-07 23:10:01  7.0  2017-04-07
8  2017-04-08 14:10:01  8.0  2017-04-08
9  2017-04-09 05:10:01  9.0  2017-04-09
10                 NaT  NaN         NaT

print (df['Date'].min())

TypeError: unorderable types: datetime.date() <= float()


Solutions working nice with NaTs:

#alternative
print (min(df['Date'].tolist()))
#print (min(df['Date'].values))

2017-04-03

Another solution:

Use floor by days for return datetimes instead dates:

df['Date'] = df['Datetime'].dt.floor('d')

Sample:

rng = pd.date_range('2017-04-03 14:10:01', periods=10, freq='15H')
df = pd.DataFrame({'Datetime': rng, 'a': range(10)})  

df['Date'] = df['Datetime'].dt.floor('d')
print (df)
             Datetime  a       Date
0 2017-04-03 14:10:01  0 2017-04-03
1 2017-04-04 05:10:01  1 2017-04-04
2 2017-04-04 20:10:01  2 2017-04-04
3 2017-04-05 11:10:01  3 2017-04-05
4 2017-04-06 02:10:01  4 2017-04-06
5 2017-04-06 17:10:01  5 2017-04-06
6 2017-04-07 08:10:01  6 2017-04-07
7 2017-04-07 23:10:01  7 2017-04-07
8 2017-04-08 14:10:01  8 2017-04-08
9 2017-04-09 05:10:01  9 2017-04-09

print (df['Datetime'].min())
2017-04-03 14:10:01

print (df['Datetime'].min().date())
2017-04-03

Upvotes: 2

Related Questions