Reputation: 21
In training regression models with the text package in R, the model's size increases with the number of training datapoints, resulting in unnecessarily large model objects. The models are created using the parsnip package with the glmnet engine. R's memory handling system, which prevents data duplication, makes it difficult to distinguish what components/attributes of the model that take up space; for instance: object_size(model)
shows 700 MB, but object_size(model$final_recipe)
and object_size(model$final_model)
are nearly the same at 698 MB respectively, and thus doesn’t show the actual size of the components.
Example:
object_size(model) # 700Mb
object_size(model$final_recipe) # 698Mb
object_size(model$final_model) # 698Mb
When removing the final_recipe
attribute (just as an example, nothing I would do in practice):
model$final_recipe <- NULL
The size of the model is still 700Mb:
object_size(model) # 700Mb
Upvotes: 2
Views: 103
Reputation: 226
The butcher package likely does what you want: You can inspect your fitted model to find out which parts are so big with butcher::weigh()
and reduce the size of the object without losing the ability to predict with it with butcher::butcher()
:
library(butcher)
library(lobstr)
our_model <- function() {
some_junk_in_the_environment <- runif(1e6) # we didn't know about
lm(mpg ~ ., data = mtcars)
}
obj_size(our_model())
#> 8.02 MB
small_lm <- lm(mpg ~ ., data = mtcars)
obj_size(small_lm)
#> 22.22 kB
big_lm <- our_model()
weigh(big_lm)
#> # A tibble: 25 × 2
#> object size
#> <chr> <dbl>
#> 1 terms 8.01
#> 2 qr.qr 0.00666
#> 3 residuals 0.00286
#> 4 fitted.values 0.00286
#> 5 effects 0.0014
#> 6 coefficients 0.00109
#> 7 call 0.000728
#> 8 model.mpg 0.000304
#> 9 model.cyl 0.000304
#> 10 model.disp 0.000304
#> # ℹ 15 more rows
butchered_lm <- butcher(big_lm)
obj_size(butchered_lm)
#> 22.74 kB
predict(butchered_lm, mtcars[1:2,])
#> Mazda RX4 Mazda RX4 Wag
#> 22.59951 22.11189
Created on 2024-01-22 with reprex v2.0.2
Upvotes: 2
Reputation: 2420
If you only care about the predictions, you can extract the coefficients as a matrix and predict with a matrix multiplication. This example shows the result is the same as predict(model, data)
.
library(glmnet)
#> Loading required package: Matrix
#> Loaded glmnet 4.1-7
x <- mtcars[, c("cyl", "wt", "gear")] |> as.matrix()
y <- mtcars$mpg
fit1 <- cv.glmnet(x, y)
# extract coefficients
b <- coef(fit1, "lambda.min")
# compare predictions
data.frame(
y,
x,
pred1 = as.vector(cbind(1, x) %*% b), # predict with matrix multiplication
pred2 = as.vector(predict(fit1, x, "lambda.min"))
) |>
head()
#> y cyl wt gear pred1 pred2
#> Mazda RX4 21.0 6 2.620 4 22.16138 22.16138
#> Mazda RX4 Wag 21.0 6 2.875 4 21.39066 21.39066
#> Datsun 710 22.8 4 2.320 4 25.90130 25.90130
#> Hornet 4 Drive 21.4 6 3.215 3 20.36304 20.36304
#> Hornet Sportabout 18.7 8 3.440 3 16.84980 16.84980
#> Valiant 18.1 6 3.460 3 19.62254 19.62254
# compare sizes
object.size(fit1)
#> 26496 bytes
object.size(b)
#> 1976 bytes
Created on 2024-01-19 with reprex v2.0.2
Upvotes: 0