Reputation: 2956
I'm trying to code a solution that takes a function, and sets the central diagonal values (i.e. along A[0,0], A[1,1], ...., A[N,N]) based on summing the values in the column below the diagonal cell.
An example:
A = np.array([[0, 0, 0],
[3, 0, 0],
[4, 2, 0]])
B = function_to_be_built(A)
B = np.array([[7, 0, 0],
[3, 2, 0],
[4, 2, 0]])
where you can see that addition operators for cell [0,0] = 3 + 4 = 7 and for cell [1,1], 2 = 2.
Of course this could be achieved relatively easily using a for
loop, but the code is designed to be in a central kernel, thus efficiency is critical. I feel like there should be a way to achieve this efficiently using numpy.... something like using np.tril
so select only the values below the diagonal?
Could anyone perhaps help me get to the solution please?
Thanks.
Upvotes: 1
Views: 89
Reputation: 53029
Here is a solution using np.add.at
y, x = np.tril_indices(len(A), -1)
np.add.at(A, [x,x], A[y,x])
This adds the below-diagonal column sums in-place to the corresponding diagonal elements. If these are not guaranteed to be zero, do a
A[np.arange(len(A)), np.arange(len(A))] = 0
before.
Note that in either case this method does not require the upper triangle to be zero.
If this operation is repeated a lot on matrices of the same size, then indices can be precomputed and flattened for increased performance.
# compute once
flatinds = np.ravel_multi_index((y, x), A.shape)
flatdiag = np.ravel_multi_index((x, x), A.shape)
flshdiag = (len(A)+1) * np.arange(len(A))
# use every iteration
Afl = A.ravel()
Afl[flshdiag] = 0 # only if necessary
np.add.at(Afl, flatdiag, Afl[flatinds])
Couldn't resist going for a really fast one. It leverages linear indexing and np.add.reduceat
. Since most sums are along relatively long stretches of column it pays off to do a full (non-lazy) transpose at the beginning and end of the computation. For N >= 5 it consistently beats all the other contenders. If you can keep the matrices involved in column-major (Fortran) order, the two transposes could even be saved in which case it is faster than all the others starting from N == 3.
# precompute this (A.shape == (N, N))
finds = np.c_[N * np.arange(N), 1 + (N+1) * np.arange(N)].ravel()[1:-2].copy()
def PPt(A):
A = A.T.copy()
Af = A.ravel()
Af[:-1:N+1] = np.add.reduceat(Af[:-N], finds)[::2]
Af[-1] = 0
return A.T.copy()
Upvotes: 2
Reputation: 18488
If you are sure that the diagonal and upper diagonal are filled with zeros, you can do:
In [43]: B = A + np.diag(A.sum(axis=0))
In [44]: B
Out[44]:
array([[7, 0, 0],
[3, 2, 0],
[4, 2, 0]])
If you cannot guarantee that these areas are zero, you can do:
In [62]: B = A + np.diag(np.tril(A, -1).sum(axis=0))
In [63]: B
Out[63]:
array([[7, 0, 0],
[3, 2, 0],
[4, 2, 0]])
Upvotes: 1
Reputation: 57033
You can take the lower triangular part of A
, sum it column-wise, convert to a diagonal matrix, and add back to A
:
A + np.eye(A.shape[0]) * np.tril(A).sum(axis=0)
Upvotes: 2