Reputation: 119
I'm writing my own sparse (one-dimensional) array class but I'm running into some performance issues. Profiling suggests that one of the bottlenecks is my __getitem__
and __setitem__
implementations, and in particular, it seems like one of the culprits might be my use of isinstance
. At the moment I have 5 calls to isinstance
in __getitem__
and I get the following data from cProfile (excerpted):
ncalls tottime percall cumtime percall filename:lineno(function) 86462 0.076 0.000 0.084 0.000 sparse.py:107(__setitem__) 189730 0.147 0.000 0.166 0.000 sparse.py:45(__getitem__) 276366 0.028 0.000 0.028 0.000 {built-in method isinstance}
My __getitem__
implements slicing as well as array access, so I suspect some kind of type introspection is necessary... but I'm wondering if isinstance
is really the best way to do that?
My __setitem__
, on the other hand, doesn't support slicing (and only calls isinstance
once in any case), so I'm at a loss as to how I could make it faster. The per-line profiling data is as follows:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
108 @profile
109 def __setitem__(self, key, value):
110 88705 121012 1.4 23.0 if not isinstance(key, int):
111 raise TypeError('list indices must be be integers')
112
113 88705 95905 1.1 18.3 if key >= self._length:
114 raise IndexError('list index out of range')
115
116 88705 85328 1.0 16.2 if key < 0:
117 key = self._length + key
118
119 88705 89186 1.0 17.0 if value == self._default:
120 35043 37087 1.1 7.1 if key in self._entries:
121 35042 39359 1.1 7.5 del self._entries[key]
122 else:
123 53662 57527 1.1 10.9 self._entries[key] = value
(I'm also willing to accept an answer suggesting a suitable fast sparse array Python module. One of my requirements is the ability to quickly iterate over (the keys of) non-zero entries only.)
Upvotes: 1
Views: 2164
Reputation: 184365
To answer your immediate question, isinstance()
is a slow call because the name is global. You can speed it up significantly simply by adding isinstance=isinstance
to __setitem__()
's signature, like so:
def __setitem__(self, key, value, isinstance=isinstance):
# und so weiter
This converts the global name to a local name, which is significantly faster to look up at run time. As a bonus, the local name is bound to the built-in isinstance
function at function definition time, so there's no overhead initializing the variable when it's called.
As others have pointed out, however, in the code you've shown, you probably don't even need that call, but can simply try to convert the key to an int
, or skip even that. (However, you could get a little speed boost out of adding int=int
to your method signature, as int
is also a global name...)
But if you're going to do the error-checking, you should also test to see if the index is less than zero. What if the length is 50 and the user wants item -100? :-)
Upvotes: 5
Reputation: 1686
How about getting rid of the exceptions?
def __setitem__(self, key, value):
# This checks that:
# - key is an integer (or can be converted to an integer)
# - key is replaced by an appropriate positive value when < 0
# - key is made = self._length when key >= self._length (not exactly as before)
key = slice(key).indices(self._length)[1]
if value == self._default:
self._entries.pop(key, None) # assuming _entries is of type dict
else:
self._entries[key] = value
Upvotes: 0
Reputation: 8075
Use assert instead:
if not isinstance(key, int):
raise TypeError('list indices must be be integers')
it's more faster than "if ....: raise exception"
Upvotes: -1
Reputation: 39279
Why don't you try replacing...
if not isinstance(key,int):
raise TypeError('list indices must be integers')
...with...
key = int(key)
I believe that will end up being a faster operation and it seems that it will be more flexible in that if someone hands your function something that can be converted to an integer it will still work.
You might also consider simply not checking for they key type at all. Simply document that using anything other than an int
is undefined behavior, and then it's the user's responsibility to be sure they use it properly.
Upvotes: 2