Gustavo
Gustavo

Reputation: 126

Rcpp memory management

I am trying to convert some character data to numeric as below. The data will come with special caracters so I have to get them out. I convert the data to std:string to search for the special caracters. Dos it creates a new variable in memory? I want to know if there is a better way to do it.

NumericVector converter_ra_(Rcpp::RObject x){
  if(x.sexp_type() == STRSXP){
    CharacterVector y(x);
    NumericVector resultado(y.size());
    for(unsigned int i = 0; i < y.size(); i++){
      std::string ra_string = Rcpp::as<std::string>(y[i]);
      //std::cout << ra_string << std::endl;
      double t = 0;
      int base = 0;
      for(int j = (int)ra_string.size(); j >= 0; j--){
        if(ra_string[j] >= 48 && ra_string[j] <= 57){
          t += ((ra_string[j] - '0') * base_m[base]);
          base++;
        }
      }
      //std::cout << t << std::endl;
      resultado[i] = t;
    }
    return resultado;
  }else if(x.sexp_type() == REALSXP){
    return NumericVector(x);
  }
  return NumericVector();
}

Upvotes: 1

Views: 1327

Answers (1)

nrussell
nrussell

Reputation: 18612

Does it creates a new variable in memory?

If the input object actually is a numeric vector (REALSXP) and you are simply returning, e.g. as<NumericVector>(input), then no additional variables are created. In any other case new memory will, of course, need to be allocated for the returned object. For example,

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector demo(RObject x) {
    if (x.sexp_type() == REALSXP) {
        return as<NumericVector>(x);
    }

    return NumericVector::create();
}

/*** R

y <- rnorm(3)
z <- letters[1:3]

data.table::address(y)
# [1] "0x6828398"

data.table::address(demo(y))
# [1] "0x6828398"

data.table::address(z)
# [1] "0x68286f8"

data.table::address(demo(z))
# [1] "0x5c7eea0"

*/

I want to know if there is a better way to do it.

First you need to define "better":

  • Faster?
  • Uses less memory?
  • Fewer lines of code?
  • More idiomatic?

Personally, I would start with the last definition since it often entails one or more of the others. For example, in this approach we

  • Define a function object Predicate that relies on the standard library function isdigit rather than trying to implement this locally
  • Define another function object that uses the erase-remove idiom to eliminate characters as determined by Predicate; and if necessary, uses std::atoi to convert what remains into a double (again, instead of trying to implement this ourselves)
  • Uses an Rcpp idiom -- the as converter -- to convert the STRSXP to a std::vector<std::string>
  • Calls std::transform to convert this into the result vector

#include <Rcpp.h>
using namespace Rcpp;

struct Predicate {
    bool operator()(char c) const
    { return !(c == '.' || std::isdigit(c)); }
};

struct Converter {
    double operator()(std::string s) const {
        s.erase(
            std::remove_if(s.begin(), s.end(), Predicate()),
            s.end()
        );

        return s.empty() ? NA_REAL : std::atof(s.c_str());
    }
};

// [[Rcpp::export]]
NumericVector convert(RObject obj) {
    if (obj.sexp_type() == REALSXP) {
        return as<NumericVector>(obj);
    }
    if (obj.sexp_type() != STRSXP) {
        return NumericVector::create();
    }

    std::vector<std::string> x = as<std::vector<std::string> >(obj);
    NumericVector res(x.size(), NA_REAL);

    std::transform(x.begin(), x.end(), res.begin(), Converter());
    return res;
}

Testing this for minimal functionality,

x <- c("123 4", "abc 1567.35 def", "abcdef", "")
convert(x)
# [1] 1234.00 1567.35      NA      NA

(y <- rnorm(3))
# [1]  1.04201552 -0.08965042 -0.88236960

convert(y)
# [1]  1.04201552 -0.08965042 -0.88236960

convert(list())
# numeric(0)

Will this be as performant as something hand-written by a seasoned C or C++ programmer? Almost certainly not. However, since we used library functions and common idioms, it is reasonably concise, likely to be bug-free, and the intention is fairly evident even at a quick glance. If you need something faster then there are probably a handful of optimizations to be made, but there's no need to begin on that premise without benchmarking and profiling first.

Upvotes: 5

Related Questions