Jayyu
Jayyu

Reputation: 329

retrieve original index of sequentially removed column (a row is also removed) of an matrix in Julia or python

I want to retrieve the original index of the column with the largest sum at each iteration after the previous column with the largest sum is removed. Meanwhile, the row of the same index of the deleted column is also deleted from the matrix at each iteration.

For example, in a 10 by 10 matrix, the 5th column has the largest sum, hence the 5th column and row are removed. Now the matrix is 9 by 9 and the sum of columns is recalculated. Suppose the 6th column has the largest sum, hence the 6th column and row of the current matrix are removed, which is the 7th in the original matrix. Do this iteratively until the desired number of columns index is preserved.

My code in Julia that does not work is pasted below. Step two in the for loop is not correct because a row is removed at each iteration, thus the sum of columns are different.

Thanks!

# a matrix of random numbers
mat = rand(10, 10);
# column sum of the original matrix
matColSum = sum(mat, dims=1);

# iteratively remove columns with the largest sum
idxColRemoveList = [];
matTemp = mat;

for i in 1:4  # Suppose 4 columns need to be removed

    # 1. find the index of the column with the largest column sum at current iteration
    sumTemp = sum(matTemp, dims=1);
    maxSumTemp = maximum(sumTemp);
    idxColRemoveTemp = argmax(sumTemp)[2];
    
    # 2. record the orignial index of the removed scenario
    idxColRemoveOrig = findall(x->x==maxSumTemp, matColSum)[1][2];
    push!(idxColRemoveList, idxColRemoveOrig);
    
    # 3. update the matrix. Note that the corresponding row is also removed.
    matTemp = matTemp[Not(idxColRemoveTemp), Not(idxColRemoveTemp)];

end

Upvotes: 1

Views: 134

Answers (2)

Jayyu
Jayyu

Reputation: 329

A simpler way to code the problem would be to replace elements in the selected column with a significantly small number instead of deleting the column. This approach avoids the use of "sort" and "pop" to improve code efficiency.

import numpy as np

n = 1000
mat = np.random.rand(n, n)
n_remove = 500
removed = []

for i in range(n_remove):
    # get sum of each column
    col_sum = np.sum(mat, axis=0)
    col_rm = np.argmax(col_sum)
    # record the column ID
    removed.append(col_rm)
    
    # replace elements in the col_rm-th column and row with the zeros
    mat[:, col_rm] = 1e-10
    mat[col_rm, :] = 1e-10  
   
print(removed)

Upvotes: 0

rpoleski
rpoleski

Reputation: 998

python solution:

import numpy as np

mat = np.random.rand(5, 5)
n_remove = 3

original = np.arange(len(mat)).tolist()
removed = []

for i in range(n_remove):
    col_sum = np.sum(mat, axis=0)
    col_rm = np.argsort(col_sum)[-1]
    removed.append(original.pop(col_rm))
    mat = np.delete(np.delete(mat, col_rm, 0), col_rm, 1)

print(removed)
print(original)
print(mat)

I'm guessing the problem you had was keeping track with information what was the index of current columns/rows in original array. I've just used a list [0, 1, 2, ...] and then pop one value in each iteration.

Upvotes: 1

Related Questions