Tianran Zhang
Tianran Zhang

Reputation: 9

how to speed up nested loops in Python

I am trying to write a piece of nested loops in my algorithm, and meet some problems that the whole algorithm takes too long time due to these nested loops. I am quite new to Python (as you may find from my below unprofessional code :( ) and hopefully someone can guide me a way to speed up my code!

The whole algorithm is for fire detection in multi 1500*6400 arrays. A small contextual analyse is applied when go through the whole array. The contextual analyse is performed in a dynamically assigned windows size way. The windows size can go from 11*11 to 31*31 until the validate values inside the sampling windows are enough for the next round calculation, for example like below:

def ContextualWindows (arrb4,arrb5,pfire):
####arrb4,arrb5,pfire are 31*31 sampling windows from large 1500*6400 numpy array
    i=5
    while i in range (5,16):
        arrb4back=arrb4[15-i:16+i,15-i:16+i]
        ## only output the array data when it is 'large' enough 
        ## to have enough good quality data to do calculation       
        if np.ma.count(arrb4back)>=min(10,0.25*i*i):
            arrb5back=arrb5[15-i:16+i,15-i:16+i]
            pfireback=pfire[15-i:16+i,15-i:16+i]
            canfire=0
            i=20
        else: 
            i=i+1

###unknown pixel: background condition could not be characterized
    if i!=20: 
        canfire=1
        arrb5back=arrb5
        pfireback=pfire
        arrb4back=arrb4
    return (arrb4back,arrb5back,pfireback,canfire)

Then this dynamic windows will be feed into next round test, for example:

b4backave=np.mean(arrb4Windows)
b4backdev=np.std(arrb4Windows)
if b4>b4backave+3.5*b4backdev:
    firetest=True

To run the whole code to my multi 1500*6400 numpy arrays, it took over half an hour, or even longer. Just wondering if anyone got an idea how to deal with it? A general idea which part I should put my effort to would be greatly helpful!

Many thanks!

Upvotes: 0

Views: 1869

Answers (2)

hpaulj
hpaulj

Reputation: 231385

I've run some time tests with ContextualWindows and variants. One i step takes about 50us, all ten about 500.

This simple iteration takes about the same time:

[np.ma.count(arrb4[15-i:16+i,15-i:16+i]) for i in range(5,16)]

The iteration mechanism, and the 'copying' arrays are minor parts of the time. Where possible numpy is making views, not copies.

I'd focus on either minimizing the number of these count steps, or speeding up the count.


Comparing times for various operations on these windows:

First time for 1 step:

In [167]: timeit [np.ma.count(arrb4[15-i:16+i,15-i:16+i]) for i in range(5,6)]
10000 loops, best of 3: 43.9 us per loop

now for the 10 steps:

In [139]: timeit [arrb4[15-i:16+i,15-i:16+i].shape for i in range(5,16)]
10000 loops, best of 3: 33.7 us per loop

In [140]: timeit [np.sum(arrb4[15-i:16+i,15-i:16+i]>500) for i in range(5,16)]
1000 loops, best of 3: 390 us per loop

In [141]: timeit [np.ma.count(arrb4[15-i:16+i,15-i:16+i]) for i in range(5,16)]
1000 loops, best of 3: 464 us per loop

Simply indexing does not take much time, but testing for conditions takes substantially more.

cumsum is sometimes used to speed up sums over sliding windows. Instead of taking sum (or mean) over each window, you calculate the cumsum and then use the differences between the front and end of window.

Trying something like that, but in 2d - cumsum in both dimensions, followed by differences between diagonally opposite corners:

In [164]: %%timeit
   .....: cA4=np.cumsum(np.cumsum(arrb4,0),1)
   .....: [cA4[15-i,15-i]-cA4[15+i,15+i] for i in range(5,16)]
   .....: 
10000 loops, best of 3: 43.1 us per loop

This is almost 10x faster than the (nearly) equivalent sum. Values don't quite match, but timing suggest that this may be worth refining.

Upvotes: 0

user1016274
user1016274

Reputation: 4209

Avoid while loops if speed is a concern. The loop lends itself to a for loop as start and end are fixed. Additionally, your code does a lot of copying which isn't really necessary. The rewritten function:

def ContextualWindows (arrb4,arrb5,pfire):
    ''' arrb4,arrb5,pfire are 31*31 sampling windows from 
        large 1500*6400 numpy array '''

    for i in range (5, 16):
        lo = 15 - i  # 10..0
        hi = 16 + i  # 21..31
        #  only output the array data when it is 'large' enough
        #  to have enough good quality data to do calculation
        if np.ma.count(arrb4[lo:hi, lo:hi]) >= min(10, 0.25*i*i):
            return (arrb4[lo:hi, lo:hi], arrb5[lo:hi, lo:hi], pfire[lo:hi, lo:hi], 0)
    else:    #  unknown pixel: background condition could not be characterized
        return (arrb4, arrb5, pfire, 1)

For clarity I've used style guidelines from PEP 8 (like extended comments, number of comment chars, spaces around operators etc.). Copying of a windowed arrb4 occurs twice here but only if the condition is fulfilled and this will happen only once per function call. The else clause will be executed only if the for-loop has run to it's end. We don't even need a break from the loop as we exit the function altogether.
Let us know if that speeds up the code a bit. I don't think it'll be much but then again there isn't much code anyway.

Upvotes: 1

Related Questions