sh_student
sh_student

Reputation: 389

How to hand R objects to C++ using Rcpp?

How to hand R objects to C++ using Rcpp?

Hey, I am new to Rcpp and have the following problem (which is probably pretty straightforward but I could not figure it out). I first create a list, vectors and a data frame in R. I want to pass these R objects to C++ to modify the list and then return the modified list again back to R for further analysis.

I made up an example of a cpp file in Rstudio:

#include <Rcpp.h>
using namespace Rcpp;

/*** R
mylist <- list(mat1 = data.frame(Col1 = c('id1','id2','id3'), Col2 = c(5,6,7), Col3 = as.factor(c('blue','green','black'))),
               mat2 = data.frame(Col1 = c('id1','id2','id3'), Col2 = c(5,6,7), Col3 = as.factor(c('blue','green','black'))),
               mat3 = data.frame(Col1 = c('id1','id2','id3'), Col2 = c(5,6,7), Col3 = as.factor(c('blue','green','black'))),
               mat4 = data.frame(Col1 = c('id1','id2','id3'), Col2 = c(5,6,7), Col3 = as.factor(c('blue','green','black'))))
myvector1 <- c(seq(8,10,1))
myvector2 <- c(seq(101,103,1))

mydataframe <- data.frame(Col1 = c('id3','id4','id5'), Col2 = seq(21,23,1), Col3 = as.factor(c('blue','green','black')))
*/


// [[Rcpp::export]]

// Some code modifying mylist with myvector1, myvector2 and mydataframe and returning the modified list again to R
// Let's say Col2 of mat1 of mylist shall be multiplied with myvector1 and Col 2 of mat2 with myvector2. 
// Col2 of mat4 shall be divided by Col2 of mydataframe
// And then the modified mylist should be returned
}

/*** R
# Some analysis with the new modified list generated in the C++ code
summary(mylist$mat2$Col2)
*/

How can I make an operation like this generally work (in reality my list is much larger)? Or is it better using an Rscript with the cxxfunction function? Thanks for the help!

Upvotes: 1

Views: 459

Answers (1)

Ralf Stubner
Ralf Stubner

Reputation: 26823

It seems your question is how to practically combine R and C++ using Rcpp. I see several approaches. Which one to choose depends on the amount of code, how often you do similar analysis, how likely you will have to come back to this particular analysis in the future, etc. It is a good idea to try them out yourself so that you can form your own heuristics when to choose which.

Single file

C++ plus R

If there is more C++ than R it is useful to have a single cpp with a special R comment. General structure:

#include <Rcpp.h>

// [[Rcpp::export]]
void foo(Rcpp::List l, Rcpp::NumericVector v1, Rcpp::NumericVector v2, Rcpp::DataFrame df) {
    Rcpp::List mat2 = l["mat2"];
    Rcpp::NumericVector Col2 = mat2["Col2"];
    Col2(0) = 10;
}

/*** R
mylist <- list(mat1 = data.frame(Col1 = c('id1','id2','id3'), Col2 = c(5,6,7), Col3 = as.factor(c('blue','green','black'))),
               mat2 = data.frame(Col1 = c('id1','id2','id3'), Col2 = c(5,6,7), Col3 = as.factor(c('blue','green','black'))),
               mat3 = data.frame(Col1 = c('id1','id2','id3'), Col2 = c(5,6,7), Col3 = as.factor(c('blue','green','black'))),
               mat4 = data.frame(Col1 = c('id1','id2','id3'), Col2 = c(5,6,7), Col3 = as.factor(c('blue','green','black'))))
myvector1 <- c(seq(8,10,1))
myvector2 <- c(seq(101,103,1))

mydataframe <- data.frame(Col1 = c('id3','id4','id5'), Col2 = seq(21,23,1), Col3 = as.factor(c('blue','green','black')))

foo(mylist, myvector1, myvector2, mydataframe)

summary(mylist$mat2$Col2)
*/

This is similar to what you have but with only a single R comment. You need to call Rcpp::sourceCpp("<file>") to have this compiled and the code in the R comment executed. In RStudio you can use "Source" for this.

Note that in the above example the list is changed by reference, which can be dangerous, since R follows a copy on write policy. For large lists this is of course expensive, which is probably why you are looking for C++ solutions.

R plus C++

If there is only very little C++ code you can use Rcpp:cppFunction("<C++ code>"). I would not use inline::cxxfunction anymore. I rarely use this approach since there is no editor support for the C++ code.

R Markdown

R Markdown allows you to combine R and C++ on equal footing using r and rcpp chunks. This is very useful if you want to add some prose to the analysis. In RStudio you get editor support for both types of chunks. However, sometimes the error messages from the C++ compiler get mangled up making them even harder to interpret.

Multiple files

Project

If the amount of code increases or data files enter you might consider some sort of project structure, e.g. directories data, R, and src for data files, R scripts with function definitions and C++ files. The driver file (R or Rmd) would then load the data, source the R scripts and sourceCpp the C++ files before doing the actual analysis.

R package

Once you use a project structure, you can also go for a package. For little extra work you get several advantages:

  • C++ code needs to be compiled only once
  • R functions are byte compiled right away
  • C++ functions can easily be used with parallel processing
  • clear strucutre to integrate documentation and tests
  • ...

Note that you can start with an empty package and a single Rmd file. In the beginning, you only use the Rmd file. Every now and then you can refactor the code to move part of it as R or C++ functions into the package.

Upvotes: 3

Related Questions