Min
Min

Reputation: 179

Strange behaviour when comparing Timestamp and datetime64 in Python2.7

Has anyone encountered similar cases as below, where if we let a be a Timestamp, b to be datetime64, then comparing a < b is fine, but b < a returns error.

If a can be compared to b, I thought we should be able to compare the other way around?

For example (Python 2.7):

>>> a
Timestamp('2013-03-24 05:32:00')
>>> b
numpy.datetime64('2013-03-23T05:33:00.000000000')
>>> a < b
False
>>> b < a
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "pandas\_libs\tslib.pyx", line 1080, in pandas._libs.tslib._Timestamp.__richcmp__ (pandas\_libs\tslib.c:20281)
TypeError: Cannot compare type 'Timestamp' with type 'long'

Many thanks in advance!

Upvotes: 4

Views: 852

Answers (1)

gyx-hh
gyx-hh

Reputation: 1431

That's an interesting question. I've done some digging around and did my best to explain some of this, although one thing i still don't get is why we get pandas throwing an error instead of numpy when we do b<a.

Regards to your question:

If a can be compared to b, I thought we should be able to compare the other way around?

That's not necesserily true. It just depends on the implementation of the comparison operators.

Take this test class for example:

class TestCom(int):
    def __init__(self, a):
    self.value = a

    def __gt__(self, other):
    print('TestComp __gt__ called')
    return True

    def __eq__(self, other):
    return self.a == other

Here I have defined my __gt__ (<) method to always return true no matter what the other value is. While __eq__ (==) left the same.

Now check the following comparisons out:

a = TestCom(9)
print(a)
# Output: 9

# my def of __ge__
a > 100

# Ouput: TestComp __gt__ called
# True

a > '100'
# Ouput: TestComp __gt__ called
# True

'100' < a

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-486-8aee1b1d2500> in <module>()
      1 # this will not use my def of __ge__
----> 2 '100' > a

TypeError: '>' not supported between instances of 'str' and 'TestCom'

So going back to your case. Looking at the timestamps_sourceCode the only thing i can think of is pandas.Timestamp does some type checking and conversion if possible.

When we're comparing a with b (pd.Timestamp against np.datetime64), Timestamp.__richcmp__ function does the comparison, if it is of type np.datetime64 then it converts it to pd.Timestamp type and does the comparison.

# we can do the following to have a comparison of say b > a
# this converts a to np.datetime64 - .asm8 is equivalent to .to_datetime64()
b > a.asm8

# or we can confert b to datetime64[ms]
b.astype('datetime64[ms]') > a

# or convert to timestamp
pd.to_datetime(b) > a

What i found surprising was, as i thought the issue is with nanoseconds not in Timestamp, is that even if you do the following the comparison between np.datetime64 with pd.Timestamp fails.

a = pd.Timestamp('2013-03-24 05:32:00.00000001')
a.nanosecond   # returns 10
# doing the comparison again where they're both ns still fails
b < a

Looking at the source code it seems like we can use == and != operators. But even they dont work as expected. Take a look at the following for an example:

a = pd.Timestamp('2013-03-24 05:32:00.00000000')
b = np.datetime64('2013-03-24 05:32:00.00000000', 'ns')

b == a  # returns False

a == b  # returns True

I think this is the result of lines 149-152 or 163-166. Where they return False if your using == and True for !=, without actually comparing the values.

Edit: The nanosecond feature was added in version 0.23.0. So you can do something like pd.Timestamp('2013-03-23T05:33:00.000000022', unit='ns'). So yes when you compare np.datetime64 it will be converted to pd.Timestamp with ns precision.

Just note that pd.Timestamp is supposed to be a replacement for python`s datetime:

Timestamp is the pandas equivalent of python's Datetime and is interchangeable with it in most cases.

But python's datetime doesn't support nanoseconds - good answer here explaining why SO_Datetime.pd.Timestamp have support for comparison between the two even if your Timestamp has nanoseconds in it. When you compare a datetime object agains pd.Timestamp object with ns they have _compare_outside_nanorange that will do the comparison.

Going back to np.datetime64, one thing to note here as explained nicely in this post SO is that it's a wrapper on an int64 type. So not suprising if i do the following:

1 > a
a > 1

Both will though an error Cannot compare type 'Timestamp' with type 'int'.

So under the hood when you do b > a the comparison most be done on an int level, this comparison will be done by np.greater() function np.greater - also take a look at ufunc_docs.

Note: I'm unable to confirm this, the numpy docs are too complex to go through. If any numpy experts can comment on this, that'll be helpful.

If this is the case, if the comparison of np.datetime64 is based on int, then the example above with a == b and b == a makes sense. Since when we do b == a we compare the int value of b against pd.Timestamp this will always return Flase for == and True for !=.

Its the same as doing say 123 == '123', this operation will not fail, it will just return False.

Upvotes: 1

Related Questions