Tyler Rinker
Tyler Rinker

Reputation: 110054

Data inside a function (package creation)

If I need to use a data set inside a function (as a lookup table) inside of a package I'm creating do I need to explicitly load the data set inside of the function?

The function and the data set are both part of my package.

Is this the correct way to use that data set inside the function:

foo <- function(x){
    x <- dataset_in_question
}

or is this better:

foo <- function(x){
    x <- data(dataset_in_question)
}

or is there some approach I'm not thinking of that's correct?

Upvotes: 15

Views: 1474

Answers (3)

Tyler Rinker
Tyler Rinker

Reputation: 110054

One can just place the data set as a .rda file in the R folder as described by Hadley here: http://r-pkgs.had.co.nz/data.html#data-sysdata

Matthew Jockers uses this approach in the syuzhet package for data sets including the bing data set as seen at ~line 452 here: https://github.com/mjockers/syuzhet/blob/master/R/syuzhet.R

bing is not available to the user but is to the package as demonstrated by: syuzhet:::bing

Essentially, the command devtools::use_data(..., internal = TRUE) will set everything up in the way it's needed.

Upvotes: 1

nachti
nachti

Reputation: 1100

For me it was necessary to use get() additionally to LazyData: true in DESCRIPTION file (see postig by @Henrik point 3) to get rid of the NOTE no visible binding for global variable .... My R version is 3.2.3.

foo <- function(x){
    get("dataset_in_question")
}

So LazyData makes dataset_in_question directly accessible (without using data("dataset_in_question", envir = environment())) and get() is to satisfy R CMD check

HTH

Upvotes: 1

Henrik
Henrik

Reputation: 14460

There was a recent discussion about this topic (in the context of package development) on R-devel, numerous points of which are relevant to this question:

  1. If only the options you provide are applicable to your example R himself (i.e., Brian Ripley) tells you to do:

    foo <- function(x){
       data("dataset_in_question")
    }
    
  2. This approach will however throw a NOTE in R CMD check which can be avoided in upcoming versions of R (or currently R devel) by using the globalVariables() function, added by John Chambers

  3. The 'correct' approach (i.e., the one advocated by Brian Ripley and Peter Dalgaard) would be to use the LazyData option for your package. See this section of "Writing R Extensions".

Btw: I do not fully understand how your first approach should work. What should x <- dataset_in_question do? Is dataset_in_question a global Variable or defined previously?

Upvotes: 12

Related Questions