TheGoat
TheGoat

Reputation: 2877

How to deploy R tidymodels ML model to GCP

I want to deploy my Tidymodels ML model to GCP so it can serve up predictions to other.

I am following along to this video from Julia Silge where she uses Vetiver and Docker to deploy to RStudio Connect.

I am also following this video from Mark Edmondson who created the googleCloudRunner package in R for setting up a GCP project and defining the APIs and service accounts needed.

I have successfully authenticated to GCP i.e. my .Renviron file contains all the necessary variables for auto-authentication i.e. the path to my client secret and auth file, I have the permissions to write to my bucket, I can create the plumber file and build the docker file however I'm having issues running the docker image on my windows machine.

I get the following error which appears to indicate an issue with the docker image finding the googleCloudStorageR package. I've manually modified the docker file to call out this package but continue to get the same error.

Here is the script I've copied from Julia's blog

Any help to move forward with this project would be greatly appreciated.

```{r}
pacman::p_load(tidyverse,tidymodels,textrecipes,vetiver,pins,googleCloudRunner,googleCloudStorageR)
```

# set up a new gcp project using this function
# https://youtu.be/RrYrMsoIXsw?si=bJwEEqEzBGpIh_vg
```{r}
#cr_setup()
```

```{r}
lego_sets <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-09-06/sets.csv.gz')
```

```{r}
glimpse(lego_sets)
```

```{r}

lego_sets %>%
  filter(num_parts > 0) %>%
  ggplot(aes(num_parts)) +
  geom_histogram(bins = 20) +
  scale_x_log10()
```

```{r}
set.seed(123)
lego_split <- lego_sets %>%
  filter(num_parts > 0) %>%
  transmute(num_parts = log10(num_parts), name) %>%
  initial_split(strata = num_parts)
```

```{r}
lego_train <- training(lego_split)
lego_test <- testing(lego_split)
```

```{r}
set.seed(234)
lego_folds <- vfold_cv(lego_train, strata = num_parts)
lego_folds
```

```{r}
lego_rec <- recipe(num_parts ~ name, data = lego_train) %>%
  step_tokenize(name) %>%
  step_tokenfilter(name, max_tokens = 200) %>%
  step_tfidf(name)

lego_rec
```


```{r}
svm_spec <- svm_linear(mode = "regression")
lego_wf <- workflow(lego_rec, svm_spec)
```

```{r}
set.seed(234)

doParallel::registerDoParallel()
lego_rs <- fit_resamples(lego_wf, lego_folds)
collect_metrics(lego_rs)
```


```{r}
final_fitted <- last_fit(lego_wf, lego_split)
collect_metrics(final_fitted)
```

```{r}
final_fitted %>%
  extract_workflow() %>%
  tidy() %>%
  arrange(-estimate)
```

```{r}
v <- final_fitted %>%
  extract_workflow() %>%
  vetiver_model(model_name = "lego-sets")
```

```{r}
v$metadata
```


## Publish and version model in GCS
```{r}
board <- board_gcs("ml-bucket-r")

board %>% vetiver_pin_write(v)
```


```{r}  
vetiver_write_plumber(board, "lego-sets")
```

```{r}
vetiver_write_docker(v)
```

```{bash}
docker build -t lego-sets .
```

## Run the docker container and specify the environment variables
```{bash}
docker run --env-file .Renviron --rm -p 8000:8000 lego-sets
```

error when running docker image

Upvotes: 0

Views: 63

Answers (1)

Julia Silge
Julia Silge

Reputation: 11663

When you run the code to create your Dockerfile, can you try passing in some additional_pkgs to get the right packages installed into the Docker container?

vetiver_write_docker(v, additional_pkgs = required_pkgs(board))

Check out the documentation here, which outlines for this argument:

additional_pkgs

A character vector of additional package names to add to the Docker image. For example, some boards like pins::board_s3() require additional software; you can use required_pkgs(board) here.

Upvotes: 1

Related Questions