Reputation: 3061
I maintain a package for use in the company where I work. Not all the programmers are as diligent as they arguably should be, and they often spam library()
calls for every package they'll ever need. This often leads to functions from one package being masked by another which is loaded later, which then leads to complications and weird bugs elsewhere in the code. These bugs are then solved by using the package::function
syntax, which just looks weird when it's used for one package function but nothing else around it, even from other functions from the same package (which weren't masked).
Obviously, the proper thing to do in such cases would be to not load every single package with library()
, saving that for the most commonly used packages in the code and then using the package::function
syntax for the ones used less frequently.
However, that's beyond my control. So I'm trying to help the programmers by adding a function to our internal package which handles package "priority", letting them define which package should take precedence when a given function name is called.
The following code works:
setDefaultPackage <- function(pkg, functions = NULL) {
pkg = paste0("package:", pkg)
if (is.null(functions)) functions <- utils::lsf.str(pkg)
for(f in functions) {
# only reassign if name doesn't yet exist or if associated environment is
# NOT the global environment.
if (exists(f)) {
canUnmask <- tryCatch({
getNamespaceName(environment(get(f, pos = parent.frame())))
TRUE
}, error = function (e) {
FALSE
})
} else {
canUnmask <- TRUE
}
if (canUnmask) {
x <- tryCatch({
get(pos = pkg, f)
}, error = function(e) {
stop("Package ", pkg, "does not have a function called ", f)
})
assign(f, x, pos = parent.frame())
}
}
}
library(stats)
environmentName(environment(filter))
#> [1] "stats"
# now mask stats::filter with dplyr::filter
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
environmentName(environment(filter))
#> [1] "dplyr"
# restore stats::filter to the top
setDefaultPackage("stats")
environmentName(environment(filter))
#> [1] "stats"
# restore dplyr::filter to the top
setDefaultPackage("dplyr")
environmentName(environment(filter))
#> [1] "dplyr"
# Fails to unmask packaged names masked by local names
filter <- function() {print(1)}
environmentName(environment(filter))
#> [1] "R_GlobalEnv"
setDefaultPackage("stats")
environmentName(environment(filter))
#> [1] "R_GlobalEnv" -- (unchanged!) --
Created on 2021-08-11 by the reprex package (v2.0.0)
The user can define whether they want filter
to call stats::filter
or dplyr::filter
.
However, this function is very inelegant: it works by defining function names in the local frame (in the examples above, the global frame; if called within another function, the calling function's frame), flooding that frame's namespace.
In this case, the namespace is flooded with object names from both stats
and dplyr
, since I've called setDefaultPackage
for both (the conflicting names currently point to stats
since that was my last call).
A much cleaner way of doing this would be to simply modify the search path:
search()
#> [1] ".GlobalEnv" "package:dplyr" "tools:rstudio" "package:stats" "package:graphics" "package:grDevices" "package:utils" "package:datasets"
#> [9] "package:methods" "Autoloads" "package:base"
Since I loaded dplyr
after stats
, it comes first in the search list.
If I could simply modify that list, shuffling package:dplyr
and package:stats
around, that'd be fantastic. Is that possible?
That is, is there a way to put a specific package at the top of the search list?
someMagicalFunction("stats")
search()
#> [1] ".GlobalEnv" "package:stats" "package:dplyr" "tools:rstudio" "package:graphics" "package:grDevices" "package:utils" "package:datasets"
#> [9] "package:methods" "Autoloads" "package:base"
Upvotes: 1
Views: 199
Reputation: 79208
You could use the following customized function:
setDefaultPackage <- function(pkg, functions = NULL){
pkg1 <- paste0("package:", pkg)
nms <- paste(pkg, 'functions', sep = '_')
if (is.null(functions)) {
if (any(search() == pkg1))
detach(pkg1, character.only = TRUE)
library(pkg, character.only = TRUE)
}
else {
if (any(search() == nms))
detach(nms, character.only = TRUE)
env <- list2env(mget(functions, as.environment(pkg1)))
attach(env, name = nms)
}
}
example:
> library(dplyr)
> search()
#> [1] ".GlobalEnv" "package:dplyr" "tools:rstudio" "package:stats"
#> [5] "package:graphics" "package:grDevices" "package:utils" "package:datasets"
#> [9] "package:methods" "Autoloads" "package:base"
Now if you want to prioritize all the functions in stats
package:
> setDefaultPackage('stats')
#> Attaching package: ‘stats’
#> The following objects are masked from ‘package:dplyr’:
#> filter, lag
> search()
#> [1] ".GlobalEnv" "package:stats" "package:dplyr" "tools:rstudio"
#> [5] "package:graphics" "package:grDevices" "package:utils" "package:datasets"
#> [9] "package:methods" "Autoloads" "package:base"
We see that stats
package comes before the dplyr
package meaning we could use lag
and filter
from stats
package.
Lets revert it back so that dplry
comes before stats
:
> setDefaultPackage('dplyr')
> search()
#> [1] ".GlobalEnv" "package:dplyr" "tools:rstudio" "package:stats"
#> [5] "package:graphics" "package:grDevices" "package:utils" "package:datasets"
#> [9] "package:methods" "Autoloads" "package:base"
What if from the beginning, we want to use the lag
from dplyr
but filter
from stats
??
ie in the search path, dplyr comes before stats, just like before. Then you could run
setDefaultPackage('stats', 'filter')
Now the filter
to be used is from stats
, while lag
is from dplyr
.
Upvotes: 1