Marta Karas
Marta Karas

Reputation: 5155

How to supress updating loaded packages window during R package installation in RStudio?

How to disable / suppress a popup window "Updating loaded packages" which keeps showing up during R package installation? I am happy to have it set to "No", but I do not know how to make it work (investigated install.packages() args and did my googling, but did not find out).

Background: my goal is to compare the installing time of a large (2k) collection of packages. I want to make it overnight in a loop where in each iteration: (1) I remove all but base priority packages, (2) I measure the time of particular package installation. I must have no popup windows (which halt the process) to do this.

sessionInfo when I start RStudio:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1   
> 

enter image description here

Upvotes: 1

Views: 1729

Answers (1)

hrbrmstr
hrbrmstr

Reputation: 78792

You should consider a benchmarking harness something akin to:

#!/bin/bash

# Create file of all installed packages

Rscript -e 'writeLines(unname(installed.packages()[,1]), "installed-pkgs.txt")'

# Iterate over the file, benchmarking package load 3x (consider bumping this up)

while read -r pkg; do

  echo -n "Benchmarking package [${pkg}]"

  for iter in {1..3}; do

    echo -n "."

    Rscript --vanilla \
      -e 'args <- commandArgs(TRUE)' \
      -e 'invisible(suppressPackageStartupMessages(xdf <- as.data.frame(as.list(system.time(library(args[1], character.only=TRUE), FALSE)))))' \
      -e 'xdf$pkg <- args[1]' \
      -e 'xdf$iter <- args[2]' \
      -e 'xdf$loaded_namespaces <- I(list(loadedNamespaces()))' \
      -e 'saveRDS(xdf, file.path("data", sprintf("%s-%s.rds", args[1], args[2])))' \
      "${pkg}" \
      "${iter}"

  done

  echo ""

done <installed-pkgs.txt

I made a ~/projects/pkgbench directory with a data subdir and put ^^ in ~/projects/pkgbench. With it you:

  • get a clean (vanilla) R session each run
  • 3 iterations for each (make it higher if you want)
  • one RDS file per-iteration
  • the number of packages (including names) in the session namespace post-load in the RDS files

When it runs (from a non-RStudio terminal session on your macOS box) you get progress (one dot per iteration):

$ ./pkgbench.sh
Benchmarking package [abind]...
Benchmarking package [acepack]...
Benchmarking package [AER]...
Benchmarking package [akima]...

You can then do something like (I killed the benchmark after just a few pkgs):

library(hrbrthemes) # github/gitlab
library(tidyverse)

map_df(
  list.files("~/projects/pkgbench/data", full.names = TRUE),
  readRDS
) %>% tbl_df() %>% print() -> bench_df
## # A tibble: 141 x 8
##    user.self sys.self elapsed user.child sys.child pkg     iter  loaded_namespaces
##        <dbl>    <dbl>   <dbl>      <dbl>     <dbl> <chr>   <chr> <list>           
##  1   0.00500 0.00100  0.00700         0.        0. abind   1     <chr [9]>        
##  2   0.00600 0.00100  0.00700         0.        0. abind   2     <chr [9]>        
##  3   0.00600 0.00100  0.00600         0.        0. abind   3     <chr [9]>        
##  4   0.00500 0.00100  0.00600         0.        0. acepack 1     <chr [9]>        
##  5   0.00600 0.001000 0.00800         0.        0. acepack 2     <chr [9]>        
##  6   0.00500 0.00100  0.00600         0.        0. acepack 3     <chr [9]>        
##  7   1.11    0.0770   1.19            0.        0. AER     1     <chr [36]>       
##  8   1.04    0.0670   1.11            0.        0. AER     2     <chr [36]>       
##  9   1.07    0.0720   1.15            0.        0. AER     3     <chr [36]>       
## 10   0.136   0.0110   0.147           0.        0. akima   1     <chr [12]>       
## # ... with 131 more rows

group_by(bench_df, pkg) %>% 
  summarise(
    med_elapsed = median(elapsed), 
    ns_ct = length(loaded_namespaces[[1]])
  ) -> bench_sum

ggplot(bench_sum, aes("elapsed", med_elapsed)) +
  geom_violin(fill = ft_cols$gray) +
  ggbeeswarm::geom_quasirandom(color = ft_cols$yellow) +
  geom_boxplot(color = "white", fill="#00000000", outlier.colour = NA) +
  theme_ft_rc(grid="Y")

enter image description here

ggplot(bench_sum, aes(ns_ct, med_elapsed)) +
  geom_point(color = ft_cols$yellow) +
  geom_smooth(color = ft_cols$peach) + # shld prbly use something better than loess
  theme_ft_rc(grid = "XY")

enter image description here

If you are going to run it overnight, make sure you disable all "sleepy/idle" time things macOS might do to you (like disable any heavyweight screensavers, prevent it from putting disks to sleep, etc).

Note that I suppressed package startup messages from printing. You may want to capture.output() instead or do a comparison with and without that.

library() also has all these parameter options:

library(
  package, 
  help, 
  pos = 2, 
  lib.loc = NULL,
  character.only = FALSE, 
  logical.return = FALSE,
  warn.conflicts = TRUE, 
  quietly = FALSE,
  verbose = getOption("verbose")
)

You may want to tweak those for various benchmarking runs as well.

I also only looked at the median of "what the package load felt like to the user" value. Consider examining all of the system.time values that are in the data frame.

If your Mac is sufficiently beefy CPU-core-wise and you have a fast solid state disk, you could consider using GNU parallel with this harness to speed up the timings. I'd definitely use more than 3 iterations per-pkg if you do this and be fairly conservative with the number of concurrent parallel runs.

Upvotes: 2

Related Questions