Scipy NDimage correlate: unbearably slow

Question

I'm running code using Python 3.3. The goal is to correlate data_3d (a 1000x1000x1000 Boolean ndarray) with kernel (a 35x35x35 float ndarray).

I then perform another correlation to sum over the previous result. As such, I am correlating another 1000x1000x1000 Boolean ndarray with a 35x35x35 float ndarray - this is exactly the same as previous step.

Here's where I get confused: The first correlation completes in 70 seconds; the second (seemingly identical) process doesn't ever complete (i.e. waited over an hour...).

I've attempted reducing the size of the second correlation (e.g., correlating with a 5x5x5 array), with same results.

Presumably, this isn't a memory issue; memory plateaus at 18 GB during the second process (but I still have 14 GB available...).

What's going on?

Here's my code:

import numpy as np
from scipy import ndimage as im

Part A:

t1 = time.time()
# Used to time the process`
# a is a np.ndarray of dtype = bool
a = im.correlate(data_3d, kernel) > threshold 
t2 = time.time()
print(t2 - t1) # About 70 seconds

Part B: This next part never completes!

b = im.correlate(a, np.ones((35, 35, 35)))
t3 = time()`

EDIT: I found the solution. Kernel for Part A is very sparse, while kernel in Part B is fully populated. Scipy must have some behind-the-curtains magic to modify the filter size for sparse matrices... This makes time complexity for A = O(N^3) and for B = O(N^3 * n^3), where N = 1D size of image and n = 1D size of kernel.

rth · Accepted Answer

It's not that the correlation operator is slow, but rather that your problem is very large.

The direct correlation (or convolution) of a 3D array of size N by a kernel of size n involves, roughlyN**3*(2*n**3) floating point operations. So with a fairly recent CPU at 10 GFLOP per core, the problem of this size will take at least 2.4 hours, even without accounting for the memory copy overhead.

Other factors can speed up the calculations though, such as multi-threading, and if sparse kernels are used. In the later case, the complexity can be reduced from O(N**3*n**3) to O(N**3) , which would explain the difference in execution time between step 1 and step 2 (as pointed out by the author of the question).

For the step 2, an FFT based approach with scipy.signal.fftconvolve (the kernel would need to be flipped to perform a cross-correlation), might be faster, particularly if the problem size N can be made equal to a power of 2 (e.g. 1024).

Scipy NDimage correlate: unbearably slow

Answers (1)

Related Questions