Select n row have highest combined value from a matrix in R

Question

This is part of a large matrix (dimension around: 1'000-1'000'000 rows x 100 - 1'000 columns):

     scen_1   scen_2  scen_3    scen_4 ...
...
9  3.262275 0.000000 0.00000 0.0000000 ...
10 2.843631 0.000000 1.22636 1.0559217 ...
11 0.000000 0.000000 0.00000 0.9836209 ...
12 2.572686 0.000000 0.00000 1.1000293 ...
13 0.000000 0.000000 0.00000 0.0000000 ...
14 0.611070 1.478159 0.00000 0.0000000 ...
15 0.000000 0.000000 0.00000 0.0000000 ...
16 0.000000 0.000000 0.00000 1.0146529 ...
...

Now, I want to select n rows, which - after getting for each column the maximum - have the highest sum, thus row that complement each other well. E.g. I select row 9 and 10 I get the combined (max values) vector 3.262275 0.00000 1.22636 1.0559217 with a sum of 5.5445567. Whereas if I select 14 and 16 I get 0.611070 1.478159 0.00000 1.0146529 with a sum of 3.1038819, thus the first option is better.

The solution for the above example for an n of 3 would be rows, 10, 14 and 9. I hope I could explain to problem well.

My approach would be to first select the row with the highest row wise sum, then literally the rows that would add the highest additional value. But I have the strong feeling that this not always gives the best solution. Calculating all possibilities combinations is not viable due to the size of the matrix. Is a genetic algorithm a solution? Or is there a simpler approach? Thanks.

Edit:

For easier understanding, here is an MWE:

# Create example matrix
mat <- matrix(c(1.562275, 0.000000, 0.00000, 0.0000000,2.843631, 0.000000, 1.22636, 1.0559217,0.000000, 0.000000, 0.00000, 0.9836209,1.572686, 0.000000, 0.00000, 1.8000293,0.000000, 0.000000, 0.00000, 0.0000000,1.611070, 1.478159, 0.00000, 0.0000000,0.000000, 0.000000, 0.00000, 0.0000000,0.000000, 0.000000, 0.00000, 1.0146529), byrow = TRUE,  ncol = 4, dimnames = list(c(9:16), c("scen_1",  "scen_2",  "scen_3", "scen_4")))

# Function to evaluate each combination of rows (this value should be maximized)
get_combined_max_value_sum <- function(choosen_rows){
  # Select rows
  sel_mat <- mat[choosen_rows,]
  
  # calculate columwise max
  max_mat <- apply(sel_mat, 2, max)
  
  # Sum the values
  return(sum(max_mat))
}

# I am looking for the function best_rows() which returns the rows, which gives the 
# maximum value (or at least a close guess) for the get_combined_max_value_sum() 
# function
best_rows <- function(n_rows){
  result <- vector()
  
  # do some magic
  
  return(result) # vector with length n_row for the "best" rows.
}

# ------------------------------------------------
# @ slamballais
# The rows with the highest rowise sum (10 & 12)
get_combined_max_value_sum(c("10","12"))

# get a lower score then row 9 and 13
get_combined_max_value_sum(c("10","14"))

ThomasIsCoding · Accepted Answer

Update (Recursive approach, sub-optimal solution)

You can define a recursive function f (see it within function thomas2), which can be any number of rows k (1 <= k <= nrow(mat))

thomas2 <- function(mat, k) {
  f <- function(mat, k) {
    if (k == 1) {
      return(which.max(rowSums(mat)))
    }
    p <- f(mat, k - 1)
    q <- seq(nrow(mat))[-p]
    rmax <- apply(mat[p, , drop = FALSE], 2, max)
    c(p, q[which.max(sapply(q, function(k) sum(pmax(rmax, mat[k, ]))))])
  }
  row.names(mat)[sort(f(mat, k))]
}

For example

> thomas2(mat, 2)
[1] "10" "14"

> thomas2(mat, 3)
[1] "10" "12" "14"

> thomas2(mat, 4)
[1] "9"  "10" "12" "14"

> thomas2(mat, 5)
[1] "9"  "10" "11" "12" "14"

> thomas2(mat, 6)
[1] "9"  "10" "11" "12" "13" "14"

Previous answer (Brute-force approach, inefficient)

Your algorithm is a greedy one, which cannot guarantee the global maximum always. Thus, a brute-force way might a straightforward workaround to reach your goal.

Maybe you can try the following brute-force method

rs <- combn(nrow(mat), 3)
row.names(mat)[rs[, which.max(apply(rs, 2, function(k) sum(do.call(pmax, data.frame(t(mat[k, ]))))))]]

which gives

[1] "10" "12" "14"

Select n row have highest combined value from a matrix in R

Answers (2)

Update (Recursive approach, sub-optimal solution)

Previous answer (Brute-force approach, inefficient)

Related Questions