Reputation: 1467
I have data on a server in the form of SAS data sets that are updated daily. I would like these to be packaged auto-magically into R packages and then dropped in a package repository on the server. This should allow my co-workers and I to easily work with this packaged data in R and keep up-to-date as it changes each day by simply calling install.packages
and update.packages
.
What is a good way to implement this automatic creation of data packages?
I have written some code that pulls in the data set, converts it and then uses packages.skeleton()
to dynamically create the package structure. I then have to overwrite the DESCRIPTION file to update the version along with some other edits. Then finally have to call tools::build and tools::check to package the whole lot and drop it in the repository. Is there a better way?
Upvotes: 2
Views: 120
Reputation: 55735
I would recommend using a makefile
to automate the conversion of datasets. This would be useful especially if there are multiple datasets and the conversion process is time consuming.
I am assuming that the sas files are in a directory called sas
. Here is the makefile
.
By typing make data
, all the *.sas7bdat
files are read from the sas
directory, using the package sas7bdat
and saved as *.rda
files of the same name in the data
directory of the package. You can add more automation by adding package installation to the makefile
and using a continuous integration system like TravisCI
so that your R package is always up-to-date.
I have created a sample repo to illustrate my idea. This is an interesting question and I think it makes sense to develop a simple, flexible and robust approach to data packing.
SAS_FILES = $(wildcard sas/*.sas7bdat)
RDA_FILES = $(patsubst sas/%.sas7bdat, data/%.rda, $(SAS_FILES))
data: $(RDA_FILES)
data/%.rda: sas/%.sas7bdat
Rscript -e "library(sas7bdat); library(tools); fname = file_path_sans_ext(basename('$<')); assign(fname, read.sas7bdat('$<')); save($(basename $(notdir $<)), file = '$@')"
Upvotes: 0
Reputation: 121608
What you can do is to create an R file under your data
folder to load data:
data
--sas_data.R
And in this sas_data.R
you write your code to load the data from the server. The code should be something like :
download.file(urll,dest_file)
## process here
sas_data = read.table(dest_file)
Then you call it using data
:
data(sas_data)
Upvotes: 1