speed up replication of rows using model

Question

I would like to create replicate predictions for one integer independent variable (iv1) given some model and a data frame called training. This is my current approach. I appreciate this is not self containing but hopefully it is self explanatory:

number_of_samples <- 10
results <- NULL
for (row in 1:nrow(training)) {

    fake_iv1_values <- sample(1:100, number_of_samples)
    case <- training[row,]

    for (iv1 in fake_iv1_values) {
        case$iv1 <- iv1

        case$prediction <- predict(some_model, newdata = case)

        results <- rbind(results, case)
    }
}

Using loops is very slow. I wonder, if this could be sped up? Thanks!

Edo · Accepted Answer

Try with this.

Reproducible fake data and model:

# create fake data
n_row <- 100
n_xs  <- 100

training <- data.frame(y = rnorm(n_row), iv1 = rnorm(n_row))
training[, paste0("x",1:n_xs)] <- replicate(n_xs, list(rnorm(n_row)))

# example model
some_model <- lm(y~., training)

Rewritten code:

number_of_samples <- 10
results <- NULL

# vector of several fake_iv1_values vectors
fake_iv1_values <- as.numeric(replicate(nrow(training), sample(1:100, number_of_samples)))

# replicate each row of the original dataframe
results <- training[rep(seq_len(nrow(training)), each = number_of_samples), ]

# add fake values to the replicated dataframe
results$iv1 <- fake_iv1_values

# get predictions
results$prediction <- predict(some_model, newdata = results)

speed up replication of rows using model

Answers (1)

Related Questions