Repeated Simulation of New Data Prediction with Tidymodels (Parsnip XGboost)

Question

I have a model, called predictive_fit <- fit(workflow, training) that classifies the Iris dataset species using xgboost. The data are pivoted wide such that each species is a dummied column represented by a 0 or 1. Here, I am trying to predict Virginica based on the Sepal and Petal columns.

Currently, I have the following code which then takes the dataset after the model has been fit to test if it can accurately predict the Virginia species of iris. (Snippet below)

testing_data <-
    test %>%
    bind_cols(
        predict(predictive_fit, test)
    )

I cannot, however, figure out how to scale this up with simulation. If I have another dataset with exactly the same structure, I would like to predict whether it is Virginica 100 times. (Snippet below)

new_iris_data <-
    new_iris_data %>%
    bind_cols(
        replicate(n = 100, predict(predictive_fit, new_iris_data))
    )

However, it looks as if when I run the new data the same predictions are just being copied 100 times. What is the appropriate way to repeatedly predict the classification? I wouldn't expect that all 100 times the model would predict exactly the same thing, but I'd like some way to have the predictions run n number of times so each and every row of new data can have its own proportion calculated.

I have already tried using the replicate() function to try this. However, it appears as if it copies the same exact results 100 times. I considered having a for loop that iterated through a different seed and then ran the predictions, but I was hoping for a more performant solution out there.

Repeated Simulation of New Data Prediction with Tidymodels (Parsnip XGboost)

Answers (1)

Related Questions