maurobio
maurobio

Reputation: 1587

Rcpp function complaining about unintialized variables

In a very first attempt at creating a C++ function which can be called from R using Rcpp, I have a simple function to compute a minimum spanning tree from a distance matrix using Prim's algorithm. This function has been converted into C++ from a former version in ANSI C (which works fine).

Here it is:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
DataFrame primlm(const int n, NumericMatrix d)
{
    double const din = 9999999.e0;
    long int i1, nc, nc1;
    double dlarge, dtot;
    NumericVector is, l, lp, dist;
    l(1) = 1;
    is(1) = 1;
    for (int i=2; i <= n; i++) {
        is(i) = 0;
    }
    for (int i=2; i <= n; i++) {
        dlarge = din;
        i1 = i - 1;
        for (int j=1; j <= i1; j++) {
            for (int k=1; k <= n; k++) {
                if (l(j) == k)
                    continue;
                if (d[l(j), k] > dlarge)
                    continue;
                if (is(k) == 1)
                    continue;
                nc = k;
                nc1 = l(j);
                dlarge = d(nc1, nc);
            }
        }
        is(nc) = 1;
        l(i) = nc;
        lp(i) = nc1;
        dist(i) = dlarge;
    }
    dtot = 0.e0;
    for (int i=2; i <= n; i++){
        dtot += dist(i);
    }
    return DataFrame::create(Named("l") = l, 
                          Named("lp") = lp, 
                          Named("dist") = dist, 
                          Named("dtot") = dtot);
}

When I compile this function using Rcpp under RStudio, I get two warnings, complaining that variables 'nc' and 'nc1' have not been initialized. Frankly, I could not understand that, as it seems to me that both variables are being initialized inside the third loop. Also, why there is no similar complaint about variable 'i1'?

Perhaps it comes as no surprise that, when attempting to call this function from R, using the below code, what I get is a crash of the R system!

# Read test data
df <- read.csv("zygo.csv", header=TRUE)
lonlat <- data.frame(df$Longitude, df$Latitude)
colnames(lonlat) <- c("lon", "lat")

# Compute distance matrix using geosphere library
library(geosphere)
d <- distm(lonlat, lonlat, fun=distVincentyEllipsoid)

# Calls Prim minimum spanning tree routine via Rcpp
library(Rcpp)
sourceCpp("Prim.cpp")
n <- nrow(df)
p <- primlm(n, d) 

Here is the dataset I use for testing purposes:

"Scientific name",Locality,Longitude,Latitude Zygodontmys,Bush Bush
Forest,-61.05,10.4 Zygodontmys,Cerro Azul,-79.4333333333,9.15
Zygodontmys,Dividive,-70.6666666667,9.53333333333 Zygodontmys,Hato El
Frio,-63.1166666667,7.91666666667 Zygodontmys,Finca Vuelta
Larga,-63.1166666667,10.55 Zygodontmys,Isla
Cebaco,-81.1833333333,7.51666666667 Zygodontmys,Kayserberg
Airstrip,-56.4833333333,3.1 Zygodontmys,Limao,-60.5,3.93333333333
Zygodontmys,Montijo Bay,-81.0166666667,7.66666666667
Zygodontmys,Parcela 200,-67.4333333333,8.93333333333 Zygodontmys,Rio
Chico,-65.9666666667,10.3166666667 Zygodontmys,San Miguel
Island,-78.9333333333,8.38333333333
Zygodontmys,Tukuko,-72.8666666667,9.83333333333
Zygodontmys,Urama,-68.4,10.6166666667
Zygodontmys,Valledup,-72.9833333333,10.6166666667

Could anyone give me a hint?

Upvotes: 1

Views: 148

Answers (1)

Ralf Stubner
Ralf Stubner

Reputation: 26823

The initializations of ncand nc1 are never reached if one of the three if statements is true. It might be that this is not possible with your data, but the compiler has no way knowing that.

However, this is not the reason for the crash. When I run your code I get:

Index out of bounds: [index=1; extent=0].

This comes from here:

NumericVector is, l, lp, dist;
l(1) = 1;
is(1) = 1;

When declaring a NumericVector you have to tell the required size if you want to assign values by index. In your case

NumericVector is(n), l(n), lp(n), dist(n);

might work. You have to analyze the C code carefully w.r.t. memory allocation and array boundaries.

Alternatively you could use the C code as is and use Rcpp to build a wrapper function, e.g.

#include <array>
#include <Rcpp.h>
using namespace Rcpp;

// One possibility for the function signature ...
double prim(const int n, double *d, double *l, double *lp, double *dist) {
    ....
} 

// [[Rcpp::export]]
List primlm(NumericMatrix d) {
    int n = d.nrow();
    std::array<double, n> lp;    // adjust size as needed!
    std::array<double, n> dist;  // adjust size as needed!
    double dtot = prim(n, d.begin(), l.begin(), lp.begin(), dist.begin()); 

    return List::create(Named("l") = l, 
                        Named("lp") = lp, 
                        Named("dist") = dist, 
                        Named("dtot") = dtot);
}

Notes:

  • I am returning a List instead of a DataFrame since dtot is a scalar value.
  • The above code is meant to illustrate the idea. Most likely it will not work without adjustments!

Upvotes: 4

Related Questions