Reputation: 1069
d = {'Dates':[pd.Timestamp('2013-01-02'),
pd.Timestamp('2013-01-03'),
pd.Timestamp('2013-01-04')],
'Num1':[1,2,3],
'Num2':[-1,-2,-3]}
df = DataFrame(data=d)
Dates Num1 Num2
0 2013-01-02 00:00:00 1 -1
1 2013-01-03 00:00:00 2 -2
2 2013-01-04 00:00:00 3 -3
Dates datetime64[ns]
Num1 int64
Num2 int64
dtype: object
df['Dates'].isin([pd.Timestamp('2013-01-04')])
0 False
1 False
2 False
Name: Dates, dtype: bool
I am expecting a True for the date "2013-01-04", what am I missing? I using the latest 0.12 version of Pandas
Upvotes: 17
Views: 11920
Reputation: 11
For some reason whether your have "time" with your date, that sequence dont correctly. Try to:
df['Dates'] = df['Dates'].dt.normalize()
df['Dates'].isin([pd.Timestamp('2013-01-04')])
You will lost the "time" from your "datetime", but if your time dont matter, it actually work :).
Upvotes: 1
Reputation: 51
I found using strings worked better in my case:
df['Dates'].isin(['2013-01-04'])
0 False
1 False
2 True
Name: Dates, dtype: bool
df_qry = df['Dates'][df['Num1']>=2]
1 2013-01-03
2 2013-01-04
Name: Dates, dtype: datetime64[ns]
df_mask = df['Dates'].isin(df_qry.astype(str))
0 False
1 True
2 True
Name: Dates, dtype: bool
df[df_mask]
Dates Num1 Num2
1 2013-01-03 2 -2
2 2013-01-04 3 -3
Just a side note: This was super handy for setting rangebreaks
on plotly time series like:
fig.update_yaxes(rangebreaks=[dict(values=df.index[df_mask].astype(str))])
Upvotes: 0
Reputation: 21
I have the same version of pandas, and @DSM's answer was helpful. Another workaround would be to use the apply method:
>>> df.Dates.apply(lambda date: date in [pd.Timestamp('2013-01-04')])
0 False
1 False
2 True
Name: Dates, dtype: bool
Upvotes: 2
Reputation: 1338
This worked for me.
df['Dates'].isin(np.array([pd.Timestamp('2013-01-04')]).astype('datetime64[ns]'))
I know that it is a bit verbose. But just in case you need to make it work this would help. Refer to https://github.com/pydata/pandas/issues/5021 for more details.
Upvotes: 3
Reputation: 353209
Yep, that looks like a bug to me. It comes down to this part of lib.ismember
:
for i in range(n):
val = util.get_value_at(arr, i)
if val in values:
result[i] = 1
else:
result[i] = 0
val
is a numpy.datetime64
object, and values
is a set
of Timestamp
objects. Testing membership should work, but doesn't:
>>> import pandas as pd, numpy as np
>>> ts = pd.Timestamp('2013-01-04')
>>> ts
Timestamp('2013-01-04 00:00:00', tz=None)
>>> dt64 = np.datetime64(ts)
>>> dt64
numpy.datetime64('2013-01-03T19:00:00.000000-0500')
>>> dt64 == ts
True
>>> dt64 in [ts]
True
>>> dt64 in {ts}
False
I think usually that behaviour -- working in a list, not working in a set -- is due to something going wrong with __hash__
:
>>> hash(dt64)
1357257600000000
>>> hash(ts)
-7276108168457487299
You can't do membership testing in a set if the hashes aren't the same. I can think of a few ways to fix this, but choosing the best one would depend upon design choices they made when implementing Timestamps that I'm not qualified to comment on.
Upvotes: 1