Reputation: 2309

How to avoid using for-loops with numpy?

I have already written the following piece of code, which does exactly what I want, but it goes way too slow. I am certain that there is a way to make it faster, but I cant seem to find how it should be done. The first part of the code is just to show what is of which shape.

two images of measurements (VV1 and HH1)
precomputed values, VV simulated and HH simulated, which both depend on 3 parameters (precomputed for (101, 31, 11) values)
the index 2 is just to put the VV and HH images in the same ndarray, instead of making two 3darrays

VV1 = numpy.ndarray((54, 43)).flatten()
HH1 = numpy.ndarray((54, 43)).flatten()
precomp = numpy.ndarray((101, 31, 11, 2))

two of the three parameters we let vary

comp = numpy.zeros((len(parameter1), len(parameter2)))

for i,(vv,hh) in enumerate(zip(VV1,HH1)):
    comp0 = numpy.zeros((len(parameter1),len(parameter2)))
    for j in range(len(parameter1)):
        for jj in range(len(parameter2)):
            comp0[j,jj] = numpy.min((vv-precomp[j,jj,:,0])**2+(hh-precomp[j,jj,:,1])**2)
    comp+=comp0

The obvious thing i know i should do is get rid of as many for-loops as I can, but I don't know how to make the numpy.min behave properly when working with more dimensions.

A second thing (less important if it can get vectorized, but still interesting) i noticed is that it takes mostly CPU time, and not RAM, but i searched a long time already, but i cant find a way to write something like "parfor" instead of "for" in matlab, (is it possible to make an @parallel decorator, if i just put the for-loop in a separate method?)

edit: in reply to Janne Karila: yeah that definately improves it a lot,

for (vv,hh) in zip(VV1,HH1):
    comp+= numpy.min((vv-precomp[...,0])**2+(hh-precomp[...,1])**2, axis=2)

Is definitely a lot faster, but is there any possibility to remove the outer for-loop too? And is there a way to make a for-loop parallel, with an @parallel or something?

Upvotes: 4

Answers (3)

askewchan

Reputation: 46578

One way to parallelize the loop is to construct it in such a way as to use map. In that case, you can then use multiprocessing.Pool to use a parallel map.

I would change this:

for (vv,hh) in zip(VV1,HH1):
    comp+= numpy.min((vv-precomp[...,0])**2+(hh-precomp[...,1])**2, axis=2)

To something like this:

def buildcomp(vvhh):
    vv, hh = vvhh
    return numpy.min((vv-precomp[...,0])**2+(hh-precomp[...,1])**2, axis=2)

if __name__=='__main__':
    from multiprocessing import Pool
    nthreads = 2
    p = Pool(nthreads)
    complist = p.map(buildcomp, np.column_stack((VV1,HH1)))
    comp = np.dstack(complist).sum(-1)

Note that the dstack assumes that each comp.ndim is 2, because it will add a third axis, and sum along it. This will slow it down a bit because you have to build the list, stack it, then sum it, but these are all either parallel or numpy operations.

I also changed the zip to a numpy operation np.column_stack, since zip is much slower for long arrays, assuming they're already 1d arrays (which they are in your example).

I can't easily test this so if there's a problem, feel free to let me know.

Upvotes: 0

Janne Karila

Reputation: 25207

This can replace the inner loops, j and jj

comp0 = numpy.min((vv-precomp[...,0])**2+(hh-precomp[...,1])**2, axis=2)

This may be a replacement for the whole loop, though all this indexing is stretching my mind a bit. (this creates a large intermediate array though)

comp = numpy.sum(
        numpy.min((VV1.reshape(-1,1,1,1) - precomp[numpy.newaxis,...,0])**2
                 +(HH1.reshape(-1,1,1,1) - precomp[numpy.newaxis,...,1])**2, 
                 axis=2),
        axis=0)

Upvotes: 6

Spencer Rathbun

Reputation: 14910

In computer science, there is the concept of Big O notation, used for getting an approximation of how much work is required to do something. To make a program fast, do as little as possible.

This is why Janne's answer is so much faster, you do fewer calculations. Taking this principle farther, we can apply the concept of memoization, because you are CPU bound instead of RAM bound. You can use the memory library, if it needs to be more complex than the following example.

class AutoVivification(dict):
        """Implementation of perl's autovivification feature."""
        def __getitem__(self, item):
            try:
                return dict.__getitem__(self, item)
            except KeyError:
                value = self[item] = type(self)()
                return value

memo = AutoVivification()

def memoize(n, arr, end):
    if not memo[n][arr][end]:
        memo[n][arr][end] = (n-arr[...,end])**2        
    return memo[n][arr][end]


for (vv,hh) in zip(VV1,HH1):
    first = memoize(vv, precomp, 0)
    second = memoize(hh, precomp, 1)
    comp+= numpy.min(first+second, axis=2)

Anything that has already been computed gets saved to memory in the dictionary, and we can look it up later instead of recomputing it. You can even break down the math being done into smaller steps that are each memoized if necessary.

The AutoVivification dictionary is just to make it easier to save the results inside of memoize, because I'm lazy. Again, you can memoize any of the math you do, so if numpy.min is slow, memoize it too.

Upvotes: 0

How to avoid using for-loops with numpy?

Answers (3)

Related Questions