Reputation: 253
I want to find and optimal K parameter for ELO calculation in a dataset. My example:
#load SciViews for ln function
library("SciViews")
ELO1 <- c(1500.000, 1500.000, 1500.000, 1500.000, 1500.000, 1500.000, 1500.000,
1500.000, 1500.000, 1512.500, 1487.500, 1512.500, 1487.500, 1487.500, 1512.500, 1475.898, 1500.000, 1475.000, 1486.617, 1512.516)
ELO2 <- c(1500.000, 1500.000, 1500.000, 1500.000, 1500.000, 1500.000, 1512.500, 1487.500, 1512.500, 1487.500, 1487.500, 1512.500, 1499.550, 1500.450, 1499.550, 1512.950, 1487.050, 1512.950, 1524.102, 1500.000)
dataset <- data.frame("ELO1" = ELO1, "ELO2" = ELO2)
#Lets set if the Player1 was the winner:
dataset$HomeWinner <- c(1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0)
The winning probability for Player 1 according to ELO is calculated the following way:
dataset$P1ExpWin <- 1/(1 + 10^((dataset$ELO2 - dataset$ELO1)/400))
dataset$P2ExpWin <- 1 - dataset$P1ExpWin
Log error is calculated the following way:
dataset$LogError <- dataset$HomeWinner * ln(dataset$P1ExpWin) + (1-dataset$HomeWinner) * ln(1-dataset$P1ExpWin)
The K parameter sets the rate of ELO change and could vary between 4 and 32. Lets set it to 20:
Kparameter = 20
And the ELO change is calculated this way depending on the outcome:
for (row in 1:nrow(dataset)) {
if (dataset[row,"HomeWinner"] == 1) {
dataset$NewELO1[row] <- dataset$ELO1 + Kparameter * (1 - dataset$P1ExpWin)
dataset$NewELO2[row] <- dataset$ELO2 - Kparameter * dataset$P2ExpWin
}
else{
dataset$NewELO1[row] <- dataset$ELO1[row] - Kparameter * dataset$P1ExpWin
dataset$NewELO2[row] <- dataset$ELO2[row] + Kparameter * (1 - dataset$P2ExpWin)
}
}
Finally, log loss is calculated as a sum for every LogError, divided by the count:
LogLossELO = sum(dataset$LogError) * -1/nrow(dataset)
My question is how to use optim or optimize function fo Kparameter to find the minimal LogLossELO (Similar to Solver in Excel)?
Upvotes: 1
Views: 92
Reputation: 269644
I have changed some of the formulas which did not make sense but this may or may not be what you intended. Please review this and correct it if necessary. Note that ln
in SciViews is identical to log
in base R so we replaced that too. The for
loop could be vectorized but we have left it as a loop for easier comparison to the code in the question.
ELO1 <- c(1500.000, 1500.000, 1500.000, 1500.000, 1500.000, 1500.000, 1500.000,
1500.000, 1500.000, 1512.500, 1487.500, 1512.500, 1487.500, 1487.500, 1512.500, 1475.898, 1500.000, 1475.000, 1486.617, 1512.516)
ELO2 <- c(1500.000, 1500.000, 1500.000, 1500.000, 1500.000, 1500.000, 1512.500, 1487.500, 1512.500, 1487.500, 1487.500, 1512.500, 1499.550, 1500.450, 1499.550, 1512.950, 1487.050, 1512.950, 1524.102, 1500.000)
HomeWinner <- c(1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0)
n <- length(HomeWinner)
lloss <- function(K) {
P1ExpWin <- 1/(1 + 10^((ELO2 - ELO1)/400))
P2ExpWin <- 1 - P1ExpWin
for (r in 1:n) {
if (HomeWinner[r] == 1) {
ELO1[r] <- ELO1[r] + K * (1 - P1ExpWin[r])
ELO2[r] <- ELO2[r] - K * P2ExpWin[r]
}
else{
ELO1[r] <- ELO1[r] - K * P1ExpWin[r]
ELO2[r] <- ELO2[r] + K * (1 - P2ExpWin[r])
}
}
P1ExpWin <- 1/(1 + 10^((ELO2 - ELO1)/400))
P2ExpWin <- 1 - P1ExpWin
LogError <- HomeWinner * log(P1ExpWin) + (1-HomeWinner) * log(P1ExpWin)
-mean(LogError) # LogLossELO
}
rng <- c(0, 100)
opt <- optimize(lloss, rng); opt
## $minimum
## [1] 76.05614
##
## $objective
## [1] 0.6833352
Below we plot the result.
curve(Vectorize(lloss)(K), rng[1], rng[2], xname = "K", ylab = "LogLossELO")
with(opt, abline(h = objective, v = minimum))
Upvotes: 1