Reputation: 179
Has anyone encountered similar cases as below, where if we let a
be a Timestamp
, b
to be datetime64
, then comparing a < b
is fine, but b < a
returns error.
If a
can be compared to b
, I thought we should be able to compare the other way around?
For example (Python 2.7):
>>> a
Timestamp('2013-03-24 05:32:00')
>>> b
numpy.datetime64('2013-03-23T05:33:00.000000000')
>>> a < b
False
>>> b < a
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "pandas\_libs\tslib.pyx", line 1080, in pandas._libs.tslib._Timestamp.__richcmp__ (pandas\_libs\tslib.c:20281)
TypeError: Cannot compare type 'Timestamp' with type 'long'
Many thanks in advance!
Upvotes: 4
Views: 852
Reputation: 1431
That's an interesting question. I've done some digging around and did my best to explain some of this, although one thing i still don't get is why we get pandas
throwing an error instead of numpy
when we do b<a
.
Regards to your question:
If a can be compared to b, I thought we should be able to compare the other way around?
That's not necesserily true. It just depends on the implementation of the comparison operators.
Take this test class for example:
class TestCom(int):
def __init__(self, a):
self.value = a
def __gt__(self, other):
print('TestComp __gt__ called')
return True
def __eq__(self, other):
return self.a == other
Here I have defined my __gt__
(<
) method to always return true no matter what the other value is. While __eq__
(==
) left the same.
Now check the following comparisons out:
a = TestCom(9)
print(a)
# Output: 9
# my def of __ge__
a > 100
# Ouput: TestComp __gt__ called
# True
a > '100'
# Ouput: TestComp __gt__ called
# True
'100' < a
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-486-8aee1b1d2500> in <module>()
1 # this will not use my def of __ge__
----> 2 '100' > a
TypeError: '>' not supported between instances of 'str' and 'TestCom'
So going back to your case. Looking at the timestamps_sourceCode the only thing i can think of is pandas.Timestamp
does some type checking and conversion if possible.
When we're comparing a with b (pd.Timestamp
against np.datetime64
), Timestamp.__richcmp__
function does the comparison, if it is of type np.datetime64
then it converts it to pd.Timestamp
type and does the comparison.
# we can do the following to have a comparison of say b > a
# this converts a to np.datetime64 - .asm8 is equivalent to .to_datetime64()
b > a.asm8
# or we can confert b to datetime64[ms]
b.astype('datetime64[ms]') > a
# or convert to timestamp
pd.to_datetime(b) > a
What i found surprising was, as i thought the issue is with nanoseconds
not in Timestamp, is that even if you do the following the comparison between np.datetime64 with pd.Timestamp fails.
a = pd.Timestamp('2013-03-24 05:32:00.00000001')
a.nanosecond # returns 10
# doing the comparison again where they're both ns still fails
b < a
Looking at the source code it seems like we can use ==
and !=
operators. But even they dont work as expected. Take a look at the following for an example:
a = pd.Timestamp('2013-03-24 05:32:00.00000000')
b = np.datetime64('2013-03-24 05:32:00.00000000', 'ns')
b == a # returns False
a == b # returns True
I think this is the result of lines 149-152 or 163-166. Where they return False
if your using ==
and True
for !=
, without actually comparing the values.
Edit:
The nanosecond
feature was added in version 0.23.0
. So you can do something like pd.Timestamp('2013-03-23T05:33:00.000000022', unit='ns')
. So yes when you compare np.datetime64
it will be converted to pd.Timestamp
with ns
precision.
Just note that pd.Timestamp
is supposed to be a replacement for python`s datetime:
Timestamp is the pandas equivalent of python's Datetime and is interchangeable with it in most cases.
But python's datetime doesn't support nanoseconds - good answer here explaining why SO_Datetime.pd.Timestamp
have support for comparison between the two even if your Timestamp
has nanoseconds in it. When you compare a datetime
object agains pd.Timestamp
object with ns
they have _compare_outside_nanorange that will do the comparison.
Going back to np.datetime64
, one thing to note here as explained nicely in this post SO is that it's a wrapper on an int64
type. So not suprising if i do the following:
1 > a
a > 1
Both will though an error Cannot compare type 'Timestamp' with type 'int'
.
So under the hood when you do b > a
the comparison most be done on an int
level, this comparison will be done by np.greater()
function np.greater - also take a look at ufunc_docs.
Note: I'm unable to confirm this, the numpy docs are too complex to go through. If any numpy experts can comment on this, that'll be helpful.
If this is the case, if the comparison of np.datetime64
is based on int
, then the example above with a == b
and b == a
makes sense. Since when we do b == a
we compare the int
value of b
against pd.Timestamp
this will always return Flase
for ==
and True
for !=
.
Its the same as doing say 123 == '123'
, this operation will not fail, it will just return False
.
Upvotes: 1