heavy rocker dude
heavy rocker dude

Reputation: 2301

Benchmarking of RCPP or RCaller of C++ or Java calling R script?

I have looked high and low for this answer so I resorted to posting here. Is there any expectation of any noticeable latency if I have a Linux C++ program call an R script/function with something like RCpp? Would this be expected or even sound reasonable? What if I use something like RCaller from a Java JAR file? Do you think C++ is still faster than Java if it is calling the same R script/function? Any expected differences? Thanks

Upvotes: 2

Views: 691

Answers (2)

Dirk is no longer here
Dirk is no longer here

Reputation: 368399

I think you want RInside which makes it very easy to embed R in your C++ application. It ships with numerous examples in four directories, including some to use it with Qt, Wt (for webapps) and MPI.

In that framework, you instantiate R once at startup and then have your own instance. Round-trip latency will be whatever time it takes you to send a command to the R instance, plus however long R takes (which may well dominate) plus the return.

RInside uses Rcpp so you get whole object transfer and all the other niceties. Have a look at the RInside example, and post questions on the rcpp-devel list.

Upvotes: 1

Gene
Gene

Reputation: 46990

I don't have special knowledge of the R foreign function interface or RCpp but have worked with quite a few others. Your questions can't be answered with certainty. There are only some rules of thumb. The job of an FFI is to satisfy the assumptions of both the calling and called environments. This is usually about matching the data layouts of both languages by copying from one to the other. (This is what RCpp is all about.) Or you can be very lucky and have them match. If the runtime environments are similar or the data being moved over the boundary between languages is small, this can be very efficient: not much more costly than a self function call. Calling C from Fortran is often very fast for this reason. If the environments are very different, large data structures must be copied. Copies consume resources: memory and processor cycles. Garbage collection is the poster child for differences between environments: separate collectors will seldom (read never) cooperate. R and Java (both garbage collected) will probably require copying for this reason. If you are writing the C++ specifically to calL R, you may be able to set up your data in RCpp structures so that no copies are needed.

I'd write some small tests starting with C++ that mimic the amount of data you must move through the interface. Run and time them to get the overhead cost. From this you can make real decisions.

Upvotes: 1

Related Questions