Wasabi
Wasabi

Reputation: 3061

Is it possible to "invert" function masking by different packages?

I maintain a package for use in the company where I work. Not all the programmers are as diligent as they arguably should be, and they often spam library() calls for every package they'll ever need. This often leads to functions from one package being masked by another which is loaded later, which then leads to complications and weird bugs elsewhere in the code. These bugs are then solved by using the package::function syntax, which just looks weird when it's used for one package function but nothing else around it, even from other functions from the same package (which weren't masked).

Obviously, the proper thing to do in such cases would be to not load every single package with library(), saving that for the most commonly used packages in the code and then using the package::function syntax for the ones used less frequently.

However, that's beyond my control. So I'm trying to help the programmers by adding a function to our internal package which handles package "priority", letting them define which package should take precedence when a given function name is called.

The following code works:

setDefaultPackage <- function(pkg, functions = NULL) {
    pkg = paste0("package:", pkg)
    
    if (is.null(functions)) functions <- utils::lsf.str(pkg)
    
    for(f in functions) {
        # only reassign if name doesn't yet exist or if associated environment is
        # NOT the global environment.
        
        if (exists(f)) {
            canUnmask <- tryCatch({
                getNamespaceName(environment(get(f, pos = parent.frame())))
                TRUE
            }, error = function (e) {
                FALSE
            })
        } else {
            canUnmask <- TRUE
        }
        
        if (canUnmask) {
            x <- tryCatch({
                get(pos = pkg, f)
            }, error = function(e) {
                stop("Package ", pkg, "does not have a function called ", f)
            })
            
            assign(f, x, pos = parent.frame())
        }
    }
}

library(stats)
environmentName(environment(filter))
#> [1] "stats"

# now mask stats::filter with dplyr::filter
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
environmentName(environment(filter))
#> [1] "dplyr"

# restore stats::filter to the top
setDefaultPackage("stats")
environmentName(environment(filter))
#> [1] "stats"

# restore dplyr::filter to the top
setDefaultPackage("dplyr")
environmentName(environment(filter))
#> [1] "dplyr"

# Fails to unmask packaged names masked by local names
filter <- function() {print(1)}
environmentName(environment(filter))
#> [1] "R_GlobalEnv"

setDefaultPackage("stats")
environmentName(environment(filter))
#> [1] "R_GlobalEnv" -- (unchanged!) --

Created on 2021-08-11 by the reprex package (v2.0.0)

The user can define whether they want filter to call stats::filter or dplyr::filter.

However, this function is very inelegant: it works by defining function names in the local frame (in the examples above, the global frame; if called within another function, the calling function's frame), flooding that frame's namespace.

enter image description here

In this case, the namespace is flooded with object names from both stats and dplyr, since I've called setDefaultPackage for both (the conflicting names currently point to stats since that was my last call).

A much cleaner way of doing this would be to simply modify the search path:

search()
#> [1] ".GlobalEnv"        "package:dplyr"     "tools:rstudio"     "package:stats"     "package:graphics"  "package:grDevices" "package:utils"     "package:datasets" 
#> [9] "package:methods"   "Autoloads"         "package:base"  

Since I loaded dplyr after stats, it comes first in the search list.

If I could simply modify that list, shuffling package:dplyr and package:stats around, that'd be fantastic. Is that possible?

That is, is there a way to put a specific package at the top of the search list?

someMagicalFunction("stats")
search()
#> [1] ".GlobalEnv"      "package:stats"     "package:dplyr"     "tools:rstudio"     "package:graphics"  "package:grDevices" "package:utils"     "package:datasets" 
#> [9] "package:methods" "Autoloads"         "package:base"

Upvotes: 1

Views: 199

Answers (1)

Onyambu
Onyambu

Reputation: 79208

You could use the following customized function:

setDefaultPackage <- function(pkg, functions = NULL){
  pkg1 <- paste0("package:", pkg)
  nms <- paste(pkg, 'functions', sep = '_')
  if (is.null(functions))  {
    if (any(search() == pkg1)) 
      detach(pkg1, character.only = TRUE)
    library(pkg, character.only = TRUE)
  }
  else {
    if (any(search() == nms)) 
      detach(nms, character.only = TRUE)
    env <- list2env(mget(functions, as.environment(pkg1)))
    attach(env, name = nms)
  }
}

example:

> library(dplyr)
> search()
 #> [1] ".GlobalEnv"        "package:dplyr"     "tools:rstudio"     "package:stats"    
 #> [5] "package:graphics"  "package:grDevices" "package:utils"     "package:datasets" 
 #> [9] "package:methods"   "Autoloads"         "package:base"   

Now if you want to prioritize all the functions in stats package:

> setDefaultPackage('stats')

#> Attaching package: ‘stats’

#> The following objects are masked from ‘package:dplyr’:

#>    filter, lag

> search()
#> [1] ".GlobalEnv"        "package:stats"     "package:dplyr"     "tools:rstudio"    
#> [5] "package:graphics"  "package:grDevices" "package:utils"     "package:datasets" 
#> [9] "package:methods"   "Autoloads"         "package:base"   

We see that stats package comes before the dplyr package meaning we could use lag and filter from stats package.

Lets revert it back so that dplry comes before stats:

> setDefaultPackage('dplyr')
> search()
 #> [1] ".GlobalEnv"        "package:dplyr"     "tools:rstudio"     "package:stats"    
 #> [5] "package:graphics"  "package:grDevices" "package:utils"     "package:datasets" 
 #> [9] "package:methods"   "Autoloads"         "package:base"   

What if from the beginning, we want to use the lag from dplyr but filter from stats?? ie in the search path, dplyr comes before stats, just like before. Then you could run

setDefaultPackage('stats', 'filter')

Now the filter to be used is from stats, while lag is from dplyr.

Upvotes: 1

Related Questions