Zeke
Zeke

Reputation: 669

Best way to store and update package data as a dev in R?

I'm developing a package that needs to store and manipulate data in the background over the course of an R session. Primarily, this will be a data frame about what the user has downloaded so far, updating as they download more, etc. This data will be used my many of the functions in my package, for example, to see if they need to download anything else. I expect that in some cases, this data frame might become relatively large (thousands of rows).

I know that one easy way of storing global variables for packages is to use store them as options with the option() function. I can also imagine calling a stateful function at the beginning of each session, which would serve as the gateway for this data.

But I generally get the feeling that options are generally meant to be pretty small in size, and that stateful functions seem like they might get confusing and requires some work to get operating smoothly. Is there a better way to store information like this? If it helps, I dont want the user to be able to directly manipulate the package data.

Upvotes: 1

Views: 183

Answers (1)

Waldi
Waldi

Reputation: 41220

You should store dataframes in the /data folder of the package.

As explained here, each dataset should be stored in a .RData file containing only one object, having the same name, and created by the save command.

As exported functions, data objects should be documented. One possibility is to create a Roxygen .R script in /R folder with the @format tag to describe the content of the dataset, and to end this script by "DataSetName"

#' My data set
#'
#' some useful data
#'
#' @format a dataframe with 464 rows and 2 variables
#' * bar : bar name
#' * foo : foo value
#'
#' @source created by me 
"DataSetName"

[EDIT] If you want to store data for the package in the current Session, you can create an hidden environment for the package in one of the package's .R scripts :

.pkg.env <- new.env()

#' Set hidden variable
#'
#' @param var 
#'
#'

pkg.set <- function(var) {
  varname <- deparse(substitute(var))
  if (exists(varname,env=parent.frame())) {
    assign(deparse(substitute(var)),var,env=.pkg.env)}
  else {
    stop(paste(varname,"doesn't exist"))
  }
}

#' get hidden variable
#'
#' @param var 
#'
#'

pkg.get <- function(var) {
  varname <- deparse(substitute(var))
  .pkg.env[[varname]]
}

Both functions aren't exported, so that users don't see them. Nevertheless, you can use them between your package's functions:

TempData <- data.frame(test="Test")
pkg.set(TempData)
pkg.get(TempData)
  test
1 Test

Upvotes: 1

Related Questions