sami
sami

Reputation: 551

pandas filtering datetime columns that include none

I have a pandas data frame with two 'datetime' columns t1,t2. Now i need to filter out all rows in the dataframe where t1<=t2 t2 could be Nan

before panda 0.19.0 i could do this:

import pandas as pd
from datetime import datetime
dt = datetime.utcnow()
dt64 = np.datetime64(dt)
df = pd.DataFrame([(dt64,None)], columns=['t1','t2'])
df[(df.t1<=df.t2)]

after pandas 0.19.0 this code fails

Traceback (most recent call last):
  File "workspace/python/MyTests/test1.py", line 87, in <module>
    testDfTimeCompare()
  File "workspace/python/MyTests/test1.py", line 80, in testDfTimeCompare
    df[(df.t1<=df.t2)]
  File "anaconda/lib/python2.7/site-packages/pandas/core/ops.py", line 813, in wrapper
    return self._constructor(na_op(self.values, other.values),
  File "anaconda/lib/python2.7/site-packages/pandas/core/ops.py", line 787, in na_op
    y = y.view('i8')
  File "anaconda/lib/python2.7/site-packages/numpy/core/_internal.py", line 367, in _view_is_safe
    raise TypeError("Cannot change data-type for object array.")
TypeError: Cannot change data-type for object array.

What is the best way to achieve this.

Upvotes: 2

Views: 1512

Answers (3)

Fabio Mendes Soares
Fabio Mendes Soares

Reputation: 1405

I solved this by explicitly setting type for the concerned columns.

df.t1=df.t1.astype(datetime)
df.t2=df.t2.astype(datetime)
>>> df[(df.t1<=df.t2)]

Empty DataFrame
Columns: [t1, t2]
Index: []
>>> df

                           t1    t2
0  2020-02-29 11:00:18.825597  None

I am using pandas 0.19.2.

Upvotes: 0

jezrael
jezrael

Reputation: 863301

I think you need convert column t2 to_datetime for cast None to NaT, then can use faster function Series.le what is same as <=:

df.t2 = pd.to_datetime(df.t2)
print (df)
                          t1  t2
0 2016-11-04 07:24:53.372838 NaT

mask = df.t1.le(df.t2)
print (mask)
0    False
dtype: bool

mask = df.t1 <= df.t2
print (mask)
0    False
dtype: bool

Upvotes: 3

sriramkumar
sriramkumar

Reputation: 164

do some mask like this:

mask = ((df <= 0).cumsum() > 0).any()
>>> mask
t1    False
t2     True
dtype: bool

Upvotes: 2

Related Questions