Jeroen Ooms
Jeroen Ooms

Reputation: 32988

How to prevent Rcpp from evaluating 'call' objects

I need to simple wrapper to serialize arbitrary R objects from within Rcpp code. Below a simplified version of my code:

// [[Rcpp::export]]
Rcpp::RawVector cpp_serialize(RObject x) {
  Rcpp::Function serialize = Rcpp::Environment::namespace_env("base")["serialize"];
  return serialize(x, R_NilValue);
}

This works great, however I found that for objects of class call the call gets evaluated before being serialized. How can I prevent this from happening? I just want to mimic serialize() in R.

# Works as intended
identical(serialize(iris, NULL), cpp_serialize(iris))

# Does not work: call is evaluated
call_object <- call("rnorm", 1000)
identical(serialize(call_object, NULL), cpp_serialize(call_object))

Update: I have a workaround in place (see below) but I am still very interested in a proper solution.

Rcpp::RawVector cpp_serialize(RObject x) {
  Rcpp::Environment env;
  env["MY_R_OBJECT"] = x;
  Rcpp::ExpressionVector expr("serialize(MY_R_OBJECT, NULL)");
  Rcpp::RawVector buf = Rcpp::Rcpp_eval(expr, env);
}

Upvotes: 4

Views: 181

Answers (2)

Dirk is no longer here
Dirk is no longer here

Reputation: 368519

tl;dr: The question was How does one serialize to Raw vectors from C? The (compiled C) function serializeToRaw() in the RApiSerialization package providing R's own serialization code. As the benchmark below shows, it is about three times faster than what was suggested above.

Longer Answer: I would not recommend mucking around with Rcpp::Function() for this.. We do in fact provide a proper package for R which access to serialization: RApiSerialization. It does not do much, but it exports exactly two function to serialize, and deserialize, from and to RAW which the RcppRedis package needs and uses.

So we can do the same here. I just called Rcpp.package.skeleton() to have a package 'jeroen' created, added the LinkingTo: and Imports: to DESCRIPTION and the imports() to NAMESPACE, and then this works:

#include <Rcpp.h>
#include <RApiSerializeAPI.h>       // provides C API with serialization

// [[Rcpp::export]]
Rcpp::RawVector cpp_serialize(SEXP s) {
  Rcpp::RawVector x = serializeToRaw(s);    // from RApiSerialize
  return x;
}

It is basically a simpler version of what you have above.

And we can call that as you do:

testJeroen <- function() {
    ## Works as intended
    res <- identical(serialize(iris, NULL), cpp_serialize(iris))

    ## Didn't work above, works now
    call_object <- call("rnorm", 1000)
    res <- res && 
           identical(serialize(call_object, NULL), cpp_serialize(call_object))

    res
}

and lo and behold, it works:

R> library(jeroen)
Loading required package: RApiSerialize
R> testJeroen()
[1] TRUE
R> 

So in short: if you don't want to muck with R, don't work with Rcpp::Function() objects.

Benchmark: Using a simple

library(jeroen)             # package containing both functions from here 
library(microbenchmark)
microbenchmark(cpp=cpp_serialize(iris),  # my suggestion
               env=env_serialize(iris))  # OP's suggestion, renamed

we get

edd@max:/tmp/jeroen$ Rscript tests/quick.R 
Loading required package: RApiSerialize
Unit: microseconds
 expr    min      lq    mean  median      uq     max neval cld
  cpp 17.471 22.1225 28.0987 24.4975 26.4795 420.001   100  a 
  env 85.028 91.0055 94.8772 92.9465 94.9635 236.710   100   b
edd@max:/tmp/jeroen$ 

showing that the answer by OP is nearly three times slower.

Upvotes: 1

Kevin Ushey
Kevin Ushey

Reputation: 21315

I think you've found an unexpected behavior in the Rcpp::Function class. An MRE:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
RObject cpp_identity(RObject x) {
  Rcpp::Function identity("identity");
  return identity(x);
}

/*** R
quoted <- quote(print(1));
identity(quoted)
cpp_identity(quoted)
*/

gives

> quoted <- quote(print(1));

> identity(quoted)
print(1)

> cpp_identity(quoted)
[1] 1
[1] 1

This happens because Rcpp effectively performs this evaluation behind the scenes:

Rcpp_eval(Rf_lang2(Rf_install("identity"), x))

which is basically like

eval(call("identity", quoted))

but the call object is not 'protected' from evaluation.

Upvotes: 3

Related Questions