Zhen Lin
Zhen Lin

Reputation: 119

Optimising `__getitem__` and `__setitem__` for a sparse array in Python

I'm writing my own sparse (one-dimensional) array class but I'm running into some performance issues. Profiling suggests that one of the bottlenecks is my __getitem__ and __setitem__ implementations, and in particular, it seems like one of the culprits might be my use of isinstance. At the moment I have 5 calls to isinstance in __getitem__ and I get the following data from cProfile (excerpted):

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    86462    0.076    0.000    0.084    0.000 sparse.py:107(__setitem__)
   189730    0.147    0.000    0.166    0.000 sparse.py:45(__getitem__)
   276366    0.028    0.000    0.028    0.000 {built-in method isinstance}

My __getitem__ implements slicing as well as array access, so I suspect some kind of type introspection is necessary... but I'm wondering if isinstance is really the best way to do that?

My __setitem__, on the other hand, doesn't support slicing (and only calls isinstance once in any case), so I'm at a loss as to how I could make it faster. The per-line profiling data is as follows:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   108                                              @profile
   109                                              def __setitem__(self, key, value):
   110     88705       121012      1.4     23.0         if not isinstance(key, int):
   111                                                      raise TypeError('list indices must be be integers')
   112                                                  
   113     88705        95905      1.1     18.3         if key >= self._length:
   114                                                      raise IndexError('list index out of range')
   115                                                  
   116     88705        85328      1.0     16.2         if key < 0:
   117                                                      key = self._length + key
   118                                                  
   119     88705        89186      1.0     17.0         if value == self._default:
   120     35043        37087      1.1      7.1             if key in self._entries:
   121     35042        39359      1.1      7.5                 del self._entries[key]
   122                                                  else:
   123     53662        57527      1.1     10.9             self._entries[key] = value

(I'm also willing to accept an answer suggesting a suitable fast sparse array Python module. One of my requirements is the ability to quickly iterate over (the keys of) non-zero entries only.)

Upvotes: 1

Views: 2164

Answers (4)

kindall
kindall

Reputation: 184365

To answer your immediate question, isinstance() is a slow call because the name is global. You can speed it up significantly simply by adding isinstance=isinstance to __setitem__()'s signature, like so:

def __setitem__(self, key, value, isinstance=isinstance):
    # und so weiter

This converts the global name to a local name, which is significantly faster to look up at run time. As a bonus, the local name is bound to the built-in isinstance function at function definition time, so there's no overhead initializing the variable when it's called.

As others have pointed out, however, in the code you've shown, you probably don't even need that call, but can simply try to convert the key to an int, or skip even that. (However, you could get a little speed boost out of adding int=int to your method signature, as int is also a global name...)

But if you're going to do the error-checking, you should also test to see if the index is less than zero. What if the length is 50 and the user wants item -100? :-)

Upvotes: 5

primroot
primroot

Reputation: 1686

How about getting rid of the exceptions?


def __setitem__(self, key, value):
   # This checks that:
   # - key is an integer (or can be converted to an integer)
   # - key is replaced by an appropriate positive value when < 0
   # - key is made = self._length when key >= self._length (not exactly as before)
   key = slice(key).indices(self._length)[1]

   if value == self._default:
       self._entries.pop(key, None) # assuming _entries is of type dict
   else:
       self._entries[key] = value

Upvotes: 0

pylover
pylover

Reputation: 8075

Use assert instead:

if not isinstance(key, int):
   raise TypeError('list indices must be be integers')

it's more faster than "if ....: raise exception"

Upvotes: -1

Chris W.
Chris W.

Reputation: 39279

Why don't you try replacing...

if not isinstance(key,int):
    raise TypeError('list indices must be integers')

...with...

key = int(key)

I believe that will end up being a faster operation and it seems that it will be more flexible in that if someone hands your function something that can be converted to an integer it will still work.


You might also consider simply not checking for they key type at all. Simply document that using anything other than an int is undefined behavior, and then it's the user's responsibility to be sure they use it properly.

Upvotes: 2

Related Questions