santoku
santoku

Reputation: 3427

What's a more efficient way to calculate the max of each row in a matrix excluding its own column?

For a given 2D matrix np.array([[1,3,1],[2,0,5]]) if one needs to calculate the max of each row in a matrix excluding its own column, with expected example return np.array([[3,1,3],[5,5,2]]), what would be the most efficient way to do so? Currently I implemented it with a loop to exclude its own col index:

n=x.shape[0]
row_max_mat=np.zeros((n,n))
rng=np.arange(n)
for i in rng:
  row_max_mat[:,i] = np.amax(s_a_array_sum[:,rng!=i],axis=1)

Is there a faster way to do so?

Upvotes: 7

Views: 346

Answers (4)

Divakar
Divakar

Reputation: 221564

Since, we are looking to get max excluding its own column, basically the output would have each row filled with the max from it, except for the max element position, for which we will need to fill in with the second largest value. As such, argpartition seems would fit right in there. So, here's one solution with it -

def max_exclude_own_col(m):
    out = np.full(m.shape, m.max(1, keepdims=True))
    sidx = np.argpartition(-m,2,axis=1)
    R = np.arange(len(sidx))
    s0,s1 = sidx[:,0], sidx[:,1]
    mask =  m[R,s0]>m[R,s1]  
    L1c,L2c = np.where(mask,s0,s1), np.where(mask,s1,s0)
    out[R,L1c] = m[R,L2c]
    return out

Benchmarking

Other working solution(s) for large arrays -

# @Alain T.'s soln
def max_accum(m):
    fmax = np.maximum.accumulate(m,axis=1)
    bmax = np.maximum.accumulate(m[:,::-1],axis=1)[:,::-1]

    r = np.full(m.shape,np.min(m))
    r[:,:-1] = np.maximum(r[:,:-1],bmax[:,1:])
    r[:,1:]  = np.maximum(r[:,1:],fmax[:,:-1])
    return r

Using benchit package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.

So, we will test out with large arrays of various shapes for timings and speedups -

In [54]: import benchit

In [55]: funcs = [max_exclude_own_col, max_accum]

In [170]: inputs = [np.random.randint(0,100,(100000,n)) for n in [10, 20, 50, 100, 200, 500]]

In [171]: T = benchit.timings(funcs, inputs, indexby='shape')                                                             

In [172]: T
Out[172]: 
Functions   max_exclude_own_col  max_accum
Shape                                     
100000x10              0.017721   0.014580
100000x20              0.028078   0.028124
100000x50              0.056355   0.089285
100000x100             0.103563   0.200085
100000x200             0.188760   0.407956
100000x500             0.439726   0.976510

# Speedups with max_exclude_own_col over max_accum
In [173]: T.speedups(ref_func_by_index=1)
Out[173]: 
Functions   max_exclude_own_col  Ref:max_accum
Shape                                         
100000x10              0.822783            1.0
100000x20              1.001660            1.0
100000x50              1.584334            1.0
100000x100             1.932017            1.0
100000x200             2.161241            1.0
100000x500             2.220725            1.0

Upvotes: 1

Alain T.
Alain T.

Reputation: 42143

You could do this using np.accumulate. Compute the forward and backward accumulations of maximums along the horizontal axis and then combine them with an offset of one:

import numpy as np

m = np.array([[1,3,1],[2,0,5]])

fmax = np.maximum.accumulate(m,axis=1)
bmax = np.maximum.accumulate(m[:,::-1],axis=1)[:,::-1]

r = np.full(m.shape,np.min(m))
r[:,:-1] = np.maximum(r[:,:-1],bmax[:,1:])
r[:,1:]  = np.maximum(r[:,1:],fmax[:,:-1])

print(r)

# [[3 1 3]
#  [5 5 2]]

This will require 3x the size of your matrix to process (although you could take that down to 2x if you want an in-place update). Adding a 3rd&4th dimension could also work using a mask but that will require columns^2 times matrix's size to process and will likely be slower.

If needed, you can apply the same technique column wise or to both dimensions (by combining row wise and column wise results).

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150735

Similar idea to yours (exclude columns one by one), but with indexing:

mask = ~np.eye(cols, dtype=bool)
a[:,np.where(mask)[1]].reshape((a.shape[0], a.shape[1]-1, -1)).max(1)

Output:

array([[3, 1, 3],
       [5, 5, 2]])

Upvotes: 3

Marco Cerliani
Marco Cerliani

Reputation: 22031

a = np.array([[1,3,1],[2,0,5]])

row_max = a.max(axis=1).reshape(-1,1)
b = (((a // row_max)+1)%2)
c = b*row_max
d = (a // row_max)*((a*b).max(axis=1).reshape(-1,1))

c+d # result

Upvotes: 1

Related Questions