MJC
MJC

Reputation: 1

Using sample() from within Rcpp

I have a matrix containing probabilities, with each of the four columns corresponding to a score (an integer in sequence from 0 to 4). I want to sample a single score for each row using the probabilities contained in that row as sampling weights. In rows where some columns do not contain probabilities (NAs instead), the sampling frame is limited to the columns (and their corresponding scores) which do (e.g. for a row with 0.45,0.55,NA,NA, either 0 or 1 would be sampled). However, I get this error (followed by several others), so how can I make it work?:

error: no matching function for call to 'as<Rcpp::IntegerVector>(Rcpp::Matrix<14>::Sub&)'
     score[i] = sample(scrs,1,true,as<IntegerVector>(probs));

Existing answers suggest RcppArmadillo is the solution but I can't get that to work either. If I add: require(RcppArmadillo) before the cppFunction and score[i] = Rcpp::RcppArmadillo::sample(scrs,1,true,probs); in place of the existing sample() statement, I get:

error: 'Rcpp::RcppArmadillo' has not been declared
     score[i] = Rcpp::RcppArmadillo::sample(scrs,1,true,probs);

Or if I also include, #include <RcppArmadilloExtensions/sample.h> at the top, I get:

fatal error: RcppArmadilloExtensions/sample.h: No such file or directory
   #include <RcppArmadilloExtensions/sample.h>

Reproducible code:

p.vals <- matrix(c(0.44892077,0.55107923,NA,NA,
                 0.37111195,0.62888805,NA,NA,
                 0.04461714,0.47764478,0.303590351,1.741477e-01,
                 0.91741642,0.07968127,0.002826406,7.589714e-05,
                 0.69330800,0.24355559,0.058340934,4.795468e-03,
                 0.43516823,0.43483784,0.120895859,9.098067e-03,
                 0.73680809,0.22595438,0.037237525,NA,
                 0.89569365,0.10142719,0.002879163,NA),nrow=8,ncol=4,byrow=TRUE)

step.vals <- c(1,1,3,3,3,3,2,2)

require(Rcpp)
cppFunction('IntegerVector scores_cpp(NumericMatrix p, IntegerVector steps){

  int prows = p.nrow();

  IntegerVector score(prows);
  
  for(int i=0;i<prows;i++){
    int step = steps[i];
    
    IntegerVector scrs = seq(0,step);
    
    int start = 0;
    int end = step;
    
    NumericMatrix::Sub probs = p(Range(i,i),Range(start,end));

    score[i] = sample(scrs,1,true,probs);
  }
  
  return score;
  
}')

test <- scores_cpp(p.vals,step.vals)
test

Note: the value of step.vals for each row is always equal to the number of columns containing probabilities in that row -1. So passing the step.values to the function may be extraneous.

Upvotes: 0

Views: 401

Answers (2)

MJC
MJC

Reputation: 1

Many thanks for the pointers. I also had some problems with indexing the matrix, so that part is changed, too. The following code works as intended (using sourceCpp()):

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>

#include <RcppArmadilloExtensions/sample.h>

using namespace Rcpp;

// [[Rcpp::export]]

IntegerVector scores_cpp(NumericMatrix p, IntegerVector steps){
  
  int prows = p.nrow();
  
  IntegerVector score(prows);
  
  for(int i=0;i<prows;i++){
    int step = steps[i];
    
    IntegerVector scrs = seq(0,step);
    
    NumericMatrix probs = p(Range(i,i),Range(0,step));

    IntegerVector sc = RcppArmadillo::sample(scrs,1,true,probs);
    score[i] = sc[0];
  }
  
  return score;
  
}

Upvotes: 0

Dirk is no longer here
Dirk is no longer here

Reputation: 368201

You may be having a 'forest for the trees' moment here. The RcppArmadillo unit tests actually provide a working example. If you look at the source file inst/tinytest/test_sample.R, it has a simple

Rcpp::sourceCpp("cpp/sample.cpp")

and in the that file inst/tinytest/cpp/sample.cpp we have the standard

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>

#include <RcppArmadilloExtensions/sample.h>

to a) tell R to look at RcppArmadillo header directories and b) include the sampler extensions. This is how it works, and this has been documented to work for probably close to a decade.

As an example I can just do (in my $HOME directory containing git/rcpparmadillo)

> Rcpp::sourceCpp("git/rcpparmadillo/inst/tinytest/cpp/sample.cpp")
> set.seed(123)
> csample_integer(1:5, 10, TRUE, c(0.4, 0.3, 0.2, 0.05, 0.05))
 [1] 1 3 2 3 4 1 2 3 2 2
> 

The later Rcpp addition works the same way, but I find working with parts of matrices to be more expressive and convenient with RcppArmadillo.

Edit: Even simpler for anybody with the RcppArmadillo package installed:

< library(Rcpp)
> sourceCpp(system.file("tinytest","cpp","sample.cpp", package="RcppArmadillo"))
> set.seed(123)
> csample_integer(1:5, 10, TRUE, c(0.4, 0.3, 0.2, 0.05, 0.05))
 [1] 1 3 2 3 4 1 2 3 2 2
> 

Upvotes: 2

Related Questions