evan54
evan54

Reputation: 3733

What are the benefits / drawbacks of a list of lists compared to a numpy array of OBJECTS with regards to SPEED?

This is a follow up to this question

What are the benefits / drawbacks of a list of lists compared to a numpy array of OBJECTS with regards to MEMORY?

I'm interested in understanding the speed implications of using a numpy array vs a list of lists when the array is of type object.

If anyone is interested in the object I'm using:

import gmpy2 as gm
gm.mpfr('0') # <-- this is the object

Upvotes: 1

Views: 2219

Answers (1)

abarnert
abarnert

Reputation: 365737

The biggest usual benefits of numpy, as far as speed goes, come from being able to vectorize operations, which means you replace a Python loop around a Python function call with a C loop around some inlined C (or even custom SIMD assembly) code. There are probably no built-in vectorized operations for arrays of mpfr objects, so that main benefit vanishes.

However, there are some place you'll still benefit:

  • Some operations that would require a copy in pure Python are essentially free in numpy—transposing a 2D array, slicing a column or a row, even reshaping the dimensions are all done by wrapping a pointer to the same underlying data with different striding information. Since your initial question specifically asked about A.T, yes, this is essentially free.
  • Many operations can be performed in-place more easily in numpy than in Python, which can save you some more copies.
  • Even when a copy is needed, it's faster to bulk-copy a big array of memory and then refcount all of the objects than to iterate through nested lists deep-copying them all the way down.
  • It's a lot easier to write your own custom Cython code to vectorize an arbitrary operation with numpy than with Python.
  • You can still get some benefit from using np.vectorize around a normal Python function, pretty much on the same order as the benefit you get from a list comprehension over a for statement.
  • Within certain size ranges, if you're careful to use the appropriate striding, numpy can allow you to optimize cache locality (or VM swapping, at larger sizes) relatively easily, while there's really no way to do that at all with lists of lists. This is much less of a win when you're dealing with an array of pointers to objects that could be scattered all over memory than when dealing with values that can be embedded directly in the array, but it's still something.

As for disadvantages… well, one obvious one is that using numpy restricts you to CPython or sometimes PyPy (hopefully in the future that "sometimes" will become "almost always", but it's not quite there as of 2014); if your code would run faster in Jython or IronPython or non-NumPyPy PyPy, that could be a good reason to stick with lists.

Upvotes: 2

Related Questions