stats_noob
stats_noob

Reputation: 5925

non-finite value supplied by optim using Pareto optimization

I am trying to follow the instructions from the GPareto R package to optimize an arbitrary function that I defined.

Here is the function (7 inputs, 4 outputs):

library(gpareto)
library(dplyr)

# create some data for this example
a1 = rnorm(1000,100,10)
b1 = rnorm(1000,100,5)
c1 = sample.int(1000, 1000, replace = TRUE)
train_data = data.frame(a1,b1,c1)

funct_set <- function (x) {
    x1 <- x[1]; x2 <- x[2]; x3 <- x[3] ; x4 <- x[4]; x5 <- x[5]; x6 <- x[6]; x[7] <- x[7]
    f <- numeric(4)
   
    #bin data according to random criteria
    train_data <- train_data %>%
        mutate(cat = ifelse(a1 <= x1 & b1 <= x3, "a",
                            ifelse(a1 <= x2 & b1 <= x4, "b", "c")))
    
    train_data$cat = as.factor(train_data$cat)
    
    #new splits
    a_table = train_data %>%
        filter(cat == "a") %>%
        select(a1, b1, c1, cat)
    
    b_table = train_data %>%
        filter(cat == "b") %>%
        select(a1, b1, c1, cat)
    
    c_table = train_data %>%
        filter(cat == "c") %>%
        select(a1, b1, c1, cat)
   
    #calculate  quantile ("quant") for each bin
    
    table_a = data.frame(a_table%>% group_by(cat) %>%
                             mutate(quant = ifelse(c1 > x[5],1,0 )))
    table_b = data.frame(b_table%>% group_by(cat) %>%
                             mutate(quant = ifelse(c1 > x[6],1,0 )))
    table_c = data.frame(c_table%>% group_by(cat) %>%
                             mutate(quant = ifelse(c1 > x[7],1,0 )))
    f[1] = mean(table_a$quant)
    f[2] = mean(table_b$quant)
    f[3] = mean(table_c$quant)
    
    #group all tables
    final_table = rbind(table_a, table_b, table_c)
    # calculate the total mean 
    f[4] = mean(final_table$quant)
   
    return (f)
}

I then set up the specifications for the optimization (e.g. upper and lower bounds, number of iterations):

lower=c(80,80,80,80, 100,200,300)
upper=c(120,120,120,120,200,300,400)
budget <- 25

Next, I run the optimization algorithm:

omEGO <- easyGParetoptim(fn = funct_set, budget = budget, lower = lower, upper = upper)

But this produces an error:

Error in optim(par = parinit, fn = fn, gr = gr, method = "L-BFGS-B", lower = lower, : non-finite value supplied by optim

I tried to look at other posts on Stack Overflow that encountered similar errors: "non-finite value supplied by optim" error when using betareg

But I am not sure how to apply the logic from these posts to fix my problem.

Upvotes: 0

Views: 998

Answers (1)

Ben Bolker
Ben Bolker

Reputation: 226577

Partial solution, which should at least get you started with debugging.

  1. I inserted the following lines:
print(x)
print(f)

before the return(f) statement.

It illustrates lots of places where one of your sub-functions returns NaN, e.g. on the very first line:

      x.1       x.2       x.3       x.4       x.5       x.6       x.7 
 91.55028 117.93290  87.38835  99.87770 100.53399 260.07302 310.29938 
[1]       NaN 0.7160494 0.7023346 0.7090000

(the first line represents the arguments, the second the values of the sub-functions).

  1. I then ran
debug(funct_set)
x1 <- c(91.55028,117.93290,87.38835,99.87770,100.53399,260.07302,310.29938)
funct_set(x1)

i.e., setting the initial values that were passed to the function and stepping through the code to see what happens (where the NaN comes from).

  1. Running table(train_data$cat) interactively after the binning step, we get
  b   c 
486 514 

that is, there are no values in your training data set that satisfy the condition a1 <= x1 & b1 <= x3.

Therefore, the derived table a_table is empty (zero rows); table_a, derived from it, is also empty; table_a$quant is a length-zero vector; and mean(table_a$quant) is NaN (the mean of a zero-length vector is undefined, as sum(x)/length(x) is 0/0).

If empty bins are a reasonable outcome, then you need to decide what objective function value should be returned (it could be something as simple as if (nrow(table_a)==0) 0 else mean(table_a$quant), if zero would be a reasonable return value). Otherwise you need to figure out how to constrain your parameters so this doesn't happen.

Upvotes: 3

Related Questions