Nick Allen
Nick Allen

Reputation: 1467

R - Automatic Creation of Data Packages

I have data on a server in the form of SAS data sets that are updated daily. I would like these to be packaged auto-magically into R packages and then dropped in a package repository on the server. This should allow my co-workers and I to easily work with this packaged data in R and keep up-to-date as it changes each day by simply calling install.packages and update.packages.

What is a good way to implement this automatic creation of data packages?

I have written some code that pulls in the data set, converts it and then uses packages.skeleton() to dynamically create the package structure. I then have to overwrite the DESCRIPTION file to update the version along with some other edits. Then finally have to call tools::build and tools::check to package the whole lot and drop it in the repository. Is there a better way?

Upvotes: 2

Views: 120

Answers (2)

Ramnath
Ramnath

Reputation: 55735

I would recommend using a makefile to automate the conversion of datasets. This would be useful especially if there are multiple datasets and the conversion process is time consuming. I am assuming that the sas files are in a directory called sas. Here is the makefile.

By typing make data, all the *.sas7bdat files are read from the sas directory, using the package sas7bdat and saved as *.rda files of the same name in the data directory of the package. You can add more automation by adding package installation to the makefile and using a continuous integration system like TravisCI so that your R package is always up-to-date.

I have created a sample repo to illustrate my idea. This is an interesting question and I think it makes sense to develop a simple, flexible and robust approach to data packing.

SAS_FILES = $(wildcard sas/*.sas7bdat)
RDA_FILES = $(patsubst sas/%.sas7bdat, data/%.rda, $(SAS_FILES))


data: $(RDA_FILES)

data/%.rda: sas/%.sas7bdat
    Rscript -e "library(sas7bdat); library(tools); fname = file_path_sans_ext(basename('$<')); assign(fname, read.sas7bdat('$<')); save($(basename $(notdir $<)), file = '$@')"

Upvotes: 0

agstudy
agstudy

Reputation: 121608

What you can do is to create an R file under your data folder to load data:

data
  --sas_data.R

And in this sas_data.R you write your code to load the data from the server. The code should be something like :

download.file(urll,dest_file)
## process here 
sas_data = read.table(dest_file)

Then you call it using data:

data(sas_data)

Upvotes: 1

Related Questions