Reputation: 769
I'm looking to optimize a square of data (maximum score), where each row is chosen without replacement. Here's a small example, but I'd like an algorithm which would allow for a 30x30 table.
opt_table = data.frame(player = c('A', 'B', 'C'),
first = c(0.5, 0.4, 0.4),
second = c(0.4, 0.7, 0.2),
third = c(0.2, 0.4, 0.3))
The maximum score would be the highest total when adding the chosen scores by column. Here, it would be 0.5 (A) + 0.7 (B) + 0.3 (C) = 1.5. You can't solve it algorithmically by always taking the maximum row of a given column, because it is without replacement.
Upvotes: 4
Views: 109
Reputation: 102469
This is an assignment problem, which can be solved if you use lp.assign
from package lpSolve
, i.e.,
library(lpSolve)
z <- lp.assign(-as.matrix(opt_table[-1]))
maxscore <- -z$objval
assignment <- colnames(opt_table[-1])[which(t(z$solution != 0), arr.ind = TRUE)[, "row"]]
and you will see
> maxscore
[1] 1.5
> assignment
[1] "first" "second" "third"
Upvotes: 4
Reputation: 226732
I have no idea if this is anywhere close to optimal; there might be some clever way to reduce this to a class of known optimization problems. In the meantime, a brute-force Monte Carlo swap + optim(..., method="SANN")
seems workable.
First, define the objective function and the update function (which randomly swaps two positions).
swap <- function(x,...) {
s <- sample(length(x), 2, replace=FALSE)
x[s] <- x[rev(s)]
return(x)
}
objfun <- function(x,M) {
sum(M[cbind(x,seq(ncol(M)))])
}
I checked that this works on the trivial problem, now let's try it on a 30x30 matrix.
set.seed(101)
M2 <- matrix(abs(rnorm(900)),30)
start <- sample(30)
optim(par=start, fn=objfun, gr=swap, control=list(fnscale=-1,
trace=TRUE, maxit=1e6),
method="SANN", M=M2)
(I set fnscale
to -1 because optim
likes to minimize. When tracing, the negative of the objective function is printed ...)
It starts at a value of 22.1 and gets to 53.06. The last improvement (from 52.31 to 53.06) is found at step 796000.
The best out of a million random draws (r <- replicate(1e6, objfun(sample(30), M=M2))
) was 39.5.
Tuning the simulated annealing parameters might improve performance. Or you could try some other stochastic global optimization approach (e.g. genetic algorithm).
Upvotes: 3
Reputation: 2050
Take the max of each column (exclude the "player" column) and sum them.
library(dplyr)
data.frame(player = c('A', 'B', 'C'),
first = c(0.5, 0.4, 0.4),
second = c(0.4, 0.7, 0.2),
third = c(0.2, 0.4, 0.3)) %>%
summarise_at(vars(-player), funs(max)) %>%
rowSums()
note that it looks like 0.5 (A) + 0.7 (B) + 0.3 (C) = 1.5
should actually be:
0.5 (A) + 0.7 (B) + 0.4 (C) = 1.6
Upvotes: 0