Reputation: 551
I have a pandas data frame with two 'datetime' columns t1,t2. Now i need to filter out all rows in the dataframe where t1<=t2 t2 could be Nan
before panda 0.19.0 i could do this:
import pandas as pd
from datetime import datetime
dt = datetime.utcnow()
dt64 = np.datetime64(dt)
df = pd.DataFrame([(dt64,None)], columns=['t1','t2'])
df[(df.t1<=df.t2)]
after pandas 0.19.0 this code fails
Traceback (most recent call last):
File "workspace/python/MyTests/test1.py", line 87, in <module>
testDfTimeCompare()
File "workspace/python/MyTests/test1.py", line 80, in testDfTimeCompare
df[(df.t1<=df.t2)]
File "anaconda/lib/python2.7/site-packages/pandas/core/ops.py", line 813, in wrapper
return self._constructor(na_op(self.values, other.values),
File "anaconda/lib/python2.7/site-packages/pandas/core/ops.py", line 787, in na_op
y = y.view('i8')
File "anaconda/lib/python2.7/site-packages/numpy/core/_internal.py", line 367, in _view_is_safe
raise TypeError("Cannot change data-type for object array.")
TypeError: Cannot change data-type for object array.
What is the best way to achieve this.
Upvotes: 2
Views: 1512
Reputation: 1405
I solved this by explicitly setting type for the concerned columns.
df.t1=df.t1.astype(datetime)
df.t2=df.t2.astype(datetime)
>>> df[(df.t1<=df.t2)]
Empty DataFrame
Columns: [t1, t2]
Index: []
>>> df
t1 t2
0 2020-02-29 11:00:18.825597 None
I am using pandas 0.19.2.
Upvotes: 0
Reputation: 863301
I think you need convert column t2
to_datetime
for cast None
to NaT
, then can use faster function Series.le
what is same as <=
:
df.t2 = pd.to_datetime(df.t2)
print (df)
t1 t2
0 2016-11-04 07:24:53.372838 NaT
mask = df.t1.le(df.t2)
print (mask)
0 False
dtype: bool
mask = df.t1 <= df.t2
print (mask)
0 False
dtype: bool
Upvotes: 3
Reputation: 164
do some mask like this:
mask = ((df <= 0).cumsum() > 0).any()
>>> mask
t1 False
t2 True
dtype: bool
Upvotes: 2