Mapping diagonal

Question

Let's say I have the following dataframe:

idx = ['H',"A","B","C","D"]
idxp = idx[1:] + [idx[0]]
idxm = [idx[-1]] + idx[:-1]
idx, idxp, idxm
j = np.arange(25).reshape(5,5)
J = pd.DataFrame(j, index=idx, columns=idx)
np.fill_diagonal(J.values, 0)
J

As an output, I would like get the array such that:

we have zeros everywhere in the lower part of the matrix below and in the diagonal
have values in the upper part of the matrix calculated by taking the numbers just above the diagonal of the matrix J, therefore the vector v = [1, 7, 13, 19].
Using v, calculate the first row as the cumulative sum of v from start to end and obtain [1, 8, 21, 40]
Using v, calculate the second row the cumulative sum of v from the second index to the end and obtain [7, 20, 39]
etc until reach the last index of v

In other words, that would give us the matrix below:

m_exp = np.array([[0,1,8,21,40],
             [0,0,7,20,39],
             [0,0,0,13,32],
             [0,0,0,0,19],
             [0,0,0,0,0],
             ])

The best way I have found so far to calculate this matrix is by using the code below:

travelup = np.array([np.pad(np.cumsum(J.values.diagonal(1)[n:]), (n+1,0), 'constant') for n in range(J.values.shape[0])])

However this involved a comprehension list and in practice my matrix is much bigger and this code is called thousands of time.

Is there any way to transform the process by using a mapping to make it faster avoiding looping?

Divakar · Accepted Answer

Few methods are listed.

I. Basic method

a = J.values
p = np.r_[0,a.ravel()[1::a.shape[1]+1]] # or np.r_[0,np.diag(a,1)]
n = len(p)
out = np.triu(np.broadcast_to(p,(n,n)),1).cumsum(1)

p and n would be re-used in alternatives listed next.

A. Alternative #1

Alternatively with broadcasted-multiplication to get the final output -

out = (~np.tri(n, dtype=bool)*p).cumsum(1)

B. Alternative #2

Alternatively with outer-subtraction on cumsum -

c = p.cumsum()
out = np.triu(c-c[:,None])

C. Alternative #3

Alternatively with np.tri to replace np.triu -

out = (c-c[:,None])*~np.tri(n, dtype=bool)

c would be re-used in alternatives listed next.

II. With numexpr

For large arrays, leverage multi-cores with numexpr. Hence, the alternatives would be -

import numexpr as ne

out = ne.evaluate('(c-c2D)*M',{'c2D':c[:,None],'M':~np.tri(n, dtype=bool)})

A. Alternative #1

out = ne.evaluate('(c-c2D)*(~M)',{'c2D':c[:,None],'M':np.tri(n, dtype=bool)})

B. Alternative #2

r = np.arange(n)
out = ne.evaluate('(c-c2D)*(r2D

Mapping diagonal

Answers (1)

Related Questions