Sean S
Sean S

Reputation: 108

Session crash after receiving results from Rcpp function

The following code compiles and executes correctly, but everytime I run it, my R session gets a fatal error shortly after it finishes. I'm running R version 3.3.2 and Rtools 3.3.

Is there anything I missed? How can I trace what's causing the crash?

#include<Rcpp.h>
using namespace Rcpp;

NumericMatrix dupCheckRcpp(NumericMatrix x) {
    int nrow, ncol;
    int i, j, k, m, n;
    bool flag;
    NumericMatrix dupMat(300,ncol);

    n = 0;
    nrow = 0; ncol = 0;
    nrow = x.nrow();
    ncol = x.ncol();

    for (i = 0; i < nrow - 1 ; ++i) {
        for (j = i + 1; j < nrow; ++j) {
            flag = TRUE;
            for (k = 0; k < ncol; ++k) {
                if (x(i,k) != x(j,k)) {
                    flag = FALSE;
                    break;
                }
            }
            if (flag == TRUE) {
                for (m = 0; m < ncol; ++m) {
                    dupMat(n,m) = x(i,m);
                }
                n = n + 1;
            }
        }
    }
    return dupMat;
}

Upvotes: 0

Views: 410

Answers (1)

coatless
coatless

Reputation: 20746

There are a few issues that are problematic with your code. We begin by looking at how the result matrix is defined, use of bool, and then detail issues with undefined behavior (UB) as a result of the matrix subset.


The definition of:

NumericMatrix dupMat(300, ncol);

has two issues:

  1. It is placed before ncol has been initialized
  2. Assumes the x matrix has nrow fixed at 300.

Move the instantiation of the dupMat till after ncol and nrow are initialized. Alternatively, move it until after you know the amount of duplicate rows.

nrow = x.nrow();
ncol = x.ncol();

Rcpp::NumericMatrix dupMat(nrow, ncol); 

In addition, bool values within C++ are written in lower case.

That is, use true in place of TRUE and false in place of FALSE while setting the values of flag variable.


There are three ways to access individual elements in a NumericMatrix, however, we'll only focus on two of them that use i,j indices.

  • (i,j): Accessing elements in this manner forgoes a bounds check and subsequent exception flag that warns if the point is not within range. In essence, this access method was causing a UB since n = n + 1 could easily go beyond the row index. Probably the UB caused havoc at a later point when RStudio or R ran a background task causing the crash to happen.
  • .at(i,j): This is the preferred method as it provides a bounds check and throws a nifty exception e.g.

Error in dupCheckRcpp(a) : index out of bounds

Which is triggered by the following code snippet:

if (flag == true) {
    for (m = 0; m < ncol; ++m) {
        Rcpp::Rcout << "dupMat (" << n << ","<< m << ")" << std::endl <<
            "x (" << i << ","<< m << ")" << std::endl;
        dupMat.at(n, m) = x.at(i, m);
    }
    n = n + 1; // able to exceed nrow.
}

The main reason for n = n + 1 to be hitting the upper bound is due to the placement being within the second for loop that gets re-instantiated each time.


Without knowing your intent behind the duplicate check, outside of guessing that it is checking for duplication that may be present within matrix rows. I'm going to stop here.

Upvotes: 4

Related Questions