Slow performance of pandas timestamp vs datetime

Question

I seem to be running into unexpectedly slow performance of arithmetic operations on pandas.Timestamp vs python regular datetime() objects.
Here is a benchmark that demonstrates:

import datetime
import pandas
import numpy

# using datetime:
def test1():
    d1 = datetime.datetime(2015, 3, 20, 10, 0, 0)
    d2 = datetime.datetime(2015, 3, 20, 10, 0, 15)
    delta = datetime.timedelta(minutes=30)

    count = 0
    for i in range(500000):
        if d2 - d1 > delta:
            count += 1

# using pandas:
def test2():
    d1 = pandas.datetime(2015, 3, 20, 10, 0, 0)
    d2 = pandas.datetime(2015, 3, 20, 10, 0, 15)
    delta = pandas.Timedelta(minutes=30)

    count = 0
    for i in range(500000):
        if d2 - d1 > delta:
            count += 1

# using numpy
def test3():
    d1 = numpy.datetime64('2015-03-20 10:00:00')
    d2 = numpy.datetime64('2015-03-20 10:00:15')
    delta = numpy.timedelta64(30, 'm')

    count = 0
    for i in range(500000):
        if d2 - d1 > delta:
            count += 1


  time1 = datetime.datetime.now()
  test1()
  time2 = datetime.datetime.now()
  test2()
  time3 = datetime.datetime.now()
  test3()
  time4 = datetime.datetime.now()

  print('DELTA test1: ' + str(time2-time1))
  print('DELTA test2: ' + str(time3-time2))
  print('DELTA test3: ' + str(time4-time3))

And corresponding results on my machine (python3.3, pandas 0.15.2):

DELTA test1: 0:00:00.131698
DELTA test2: 0:00:10.034970
DELTA test3: 0:00:05.233389

Is this expected?
Are there ways to eliminate the performance problem other than switching code to Python's default datetime implementation as much as possible?

Slow performance of pandas timestamp vs datetime

Answers (1)

Related Questions