Reputation: 89
So I have this definition here,
DP[i,j] = f[i,j] + min(DP[i−1, j −1], DP[i−1, j], DP[i−1, j +1])
which defines the minimum accrued cost to go from the top of the NxM matrix to the bottom of the matrix. Each cell in f represents a value/cost (1.2, 0, 10, etc.) to travel to that cell from another cell.
The matrix may be large (1500x1500, It's Gradient map of an image), and the DP algorithm I programmed came out to be about a second per run for my matrices. This matrix needs to run hundreds of times per execution, so total program run time comes out to be several minutes long. This loop is about 99% of my bottleneck, so I am trying to optimize this loop with Python/numpys vectorization methods. I only have access to Numpy, and Scipy.
Note: I don't program in python hardly at all, so the solution may just be obvious idk.
First attempt, Just the straightforward loop, time here is about 2-2.5 seconds per run
DP = f.copy()
for r in range(2, len(DP) - 1): # Start at row 2 since row one doesn't change
for c in range(1, len(DP[0]) - 1):
DP[r][c] += min(DP[r - 1, c-1:c+2])
Second attempt, I tried to leverage some numpy vectorizations functions "fromiter" to calculate entire rows at a time rather than column by column, time here is about 1-1.5 seconds per run. My goal is to get this at least an order of magnitude faster, but I am stumped on how else I can optimize this.
DP = f.copy()
for r in range(2, len(DP) - 1):
def foo(arr):
idx, val = arr
if idx == 0 or idx == len(DP[[0]) - 1:
return np.inf
return val + min(DP[r - 1, idx - 1], DP[r - 1, idx], DP[r - 1, idx + 1])
DP[r, :] = np.fromiter(map(foo, enumerate(DP[r, :])))
Upvotes: 1
Views: 129
Reputation: 803
As hpaulj stated, being your problem inherently sequential it will be hard to fully vectorize, although it seems possible (every cell is updated based on values of the row r=2
, the difference is the considered number of triplets from row 2 for each of the following rows) so perhaps you can find a smart way to do it!
That being said, a quick and half-vectorized solution would be to use the neat way of performing sliding windows with fancy indexing proposed by user42541, so we replace the inner loop with a vectorized call:
indexer = np.arange(3)[:,None] + np.arange(DP.shape[1] - 2)[None,:]
for r in range(2, DP.shape[0] - 1):
DP[r,1:-1] += np.min(DP[r-1,indexer], axis = 0)
This results in a speed-up relative to your double loop method (your vectorized solution didn't work in my pc) of about two orders of magnitude for a 1500x1500 array of integers.
Upvotes: 2