Phi89
Phi89

Reputation: 68

Subsetting using a Bool-Vector in Rcpp-Function (problems of a Rcpp Beginner...)

Problem description (think of a membership with different prices for adults and kids): I am having two data sets, one containing age and a code. A second dataframe "decodes" the codes to numeric values dependent someone is a kid or adult. I know want to match the codes in both data sets and receive a vector that contains numeric values for each customer in the data set.

I can make this work with standard R-functionalities, but since my original data contains several million observations I would like to speed up computation using the Rcpp package.

Unfortunately I do not succeed, especially how to perform the subsetting based on a logical vector as I would do it in R. I am quite new to Rcpp and have no experience with C++ so I am maybe missing some very basic point.

I attached a minimum working example for R and appreciate any kind of help or explanation!


library(Rcpp)

raw_data = data.frame(
       age = c(10, 14, 99, 67, 87, 54, 12, 44, 22, 8),
       iCode = c("code1", "code2", "code3", "code1", "code4", "code3", "code2", "code5", "code5", "code3"))

decoder = data.frame(
        code = c("code1","code2","code3","code4","code5"),
        kid = c(0,0,0,0,100),
        adult = c(100,200,300,400,500))

#-------- R approach (works, but takes ages for my original data set)
calc_value = function(data, decoder){
y = nrow(data)
for (i in 1:nrow(data)){
   position_in_decoder = (data$iCode[i] == decoder$code)
   if (data$age[i] > 18){
          y[i] = decoder$adult[position_in_decoder]
      }else{
          y[i] = decoder$kid[position_in_decoder]
      }
    }
 return(y)
 }

y = calc_value(raw_data, decoder)

#--------- RCPP approach (I cannot make this one work) :(

cppFunction(
'NumericVector calc_Rcpp(DataFrame df, DataFrame decoder) {
 NumericVector age = df["age"];
 CharacterVector iCode = df["iCode"];
 CharacterVector code = decoder["code"];
 NumericVector adult = decoder["adult"];
 NumericVector kid = decoder["kid"];
 const int n = age.size();
 LogicalVector position;
 NumericVector y(n);

  for (int i=0; i < n; ++i) {
    position = (iCode[i] == code);
    if (age[i] > 18 ) y[i] = adult[position];
    else y[i] = kid[position];
    }
  return y;
  }')

Upvotes: 0

Views: 115

Answers (1)

Ralf Stubner
Ralf Stubner

Reputation: 26823

There is no need to go for C++ here. Just use R properly:

raw_data = data.frame(
  age = c(10, 14, 99, 67, 87, 54, 12, 44, 22, 8),
  iCode = c("code1", "code2", "code3", "code1", "code4", "code3", "code2", "code5", "code5", "code3"))

decoder = data.frame(
  code = c("code1","code2","code3","code4","code5"),
  kid = c(0,0,0,0,100),
  adult = c(100,200,300,400,500))

foo <- merge(raw_data, decoder, by.x = "iCode", by.y = "code")
foo$res <- ifelse(foo$age > 18, foo$adult, foo$kid)
foo
#>    iCode age kid adult res
#> 1  code1  10   0   100   0
#> 2  code1  67   0   100 100
#> 3  code2  14   0   200   0
#> 4  code2  12   0   200   0
#> 5  code3  54   0   300 300
#> 6  code3  99   0   300 300
#> 7  code3   8   0   300   0
#> 8  code4  87   0   400 400
#> 9  code5  44 100   500 500
#> 10 code5  22 100   500 500

That should also work for large data sets.

Upvotes: 2

Related Questions