noirritchandra
noirritchandra

Reputation: 115

Problem with Using RcppMLPACK in My Own R Package

I am trying to develop an R package using the kmeans functionality from RcppMLPACK. I in including the header part below:

#include <RcppArmadillo.h>
#include <RcppMLPACK.h>
#include <RcppGSL.h>
#include <RcppDist.h>
#include <sstream>
#include <iostream>
#include <fstream>
#include<omp.h>
#include<gsl/gsl_math.h>
#include<gsl/gsl_rng.h>
#include<gsl/gsl_randist.h>
#include<gsl/gsl_sf.h>

// [[Rcpp::depends(RcppProgress)]]
#include <progress.hpp>
#include <progress_bar.hpp>


// [[Rcpp::depends(RcppArmadillo,RcppDist)]]
// [[Rcpp::depends(RcppMLPACK)]]
// [[Rcpp::depends(RcppGSL)]]
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::plugins(openmp)]]

using namespace mlpack::kmeans ;
using namespace arma;

My Makevars file-body is given below:

CXX_STD = CXX17
GSL_CFLAGS=`${R_HOME}/bin/Rscript -e "RcppGSL:::CFlags()" 4`
GSL_LIBS=`${R_HOME}/bin/Rscript -e "RcppGSL:::LdFlags()"`
RCPP_LDFLAGS=`${R_HOME}/bin/Rscript -e "Rcpp:::LdFlags()"`

PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) $(GSL_CFLAGS)
PKG_LIBS = $(SHLIB_OPENMP_CXXFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) $(GSL_LIBS) $(RCPP_LDFLAGS) 

I am using macOS ventura. When I try to build my R package, it shows the following error

In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppMLPACK/include/mlpack/core.hpp:171,
                 from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppMLPACK/include/RcppMLPACK.h:4,
                 from RcppExports.cpp:6:
/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppMLPACK/include/mlpack/prereqs.hpp:46:10: fatal error: boost/math/special_functions/gamma.hpp: No such file or directory
  >> 46 | #include <boost/math/special_functions/gamma.hpp>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

However if I simply Rcpp::sourcecpp on my C++ file, then it compiles perfectly. Kindly help me in debugging the issue.

P.S. I am using gcc instead of clang. Both boost and mlpack are installed in my system.

Upvotes: 0

Views: 148

Answers (1)

Dirk is no longer here
Dirk is no longer here

Reputation: 368499

The topic is a little underdocumented: mlpack is a large package and contains a lot, but there is no 'quick start' from R. At the same time your question may have overcomplicated things by including several kitchen sinks worth of included libraries. I find that adding too much too early muddles things.

So here is what I did (usng mlpack 3.4.2, see below for mlpack 4.0.1):

Viability

I first created a minimal C++ file include just the two headers and not doing much.

It looked like this, give or take:

#include <Rcpp/Rcpp>
#include <mlpack.h>
    
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(mlpack)]]

// [[Rcpp::export]]
void foo() {
    Rcpp::Rcout << "Foo\n";
}

/*** R
foo()
*/

Compiling this means that mlpack is found. I have the CRAN package installed.

Running kmeans (mlpack 3.4.*, see below for 4.0.1)

This gets a little more complicated for me as I happen to (mainly) work on Ubuntu 22.10 which only has an older mlpack 3.4.2 as a convenient system library from the distribution. I think that with a newer mlpack release 4.* I would not need to link.

As I often do I took a simple example from the unit tests. It has data, as well as an invocation. The full file now is as follows:

#include <Rcpp/Rcpp>
#include <mlpack.h>

// Two include directories adjusted for my use of mlpack 3.4.2 on Ubuntu
#include <mlpack/core.hpp>
#include <mlpack/methods/kmeans/kmeans.hpp>
#include <mlpack/methods/kmeans/random_partition.hpp>
#include <mlpack/methods/neighbor_search/neighbor_search.hpp>

// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(mlpack)]]

// This is 'borrowed' from mlpack's own src/mlpack/tests/kmeans_test.cpp
// and src/mlpack/tests/kmeans_test.cpp. We borrow the data set, and the
// code from the first test function. Passing data from R in easy thanks
// to RcppArmadillo, 'and left as an exercise'.

// Generate dataset; written transposed because it's easier to read.
arma::mat kMeansData("  0.0   0.0;" // Class 1.
                     "  0.3   0.4;"
                     "  0.1   0.0;"
                     "  0.1   0.3;"
                     " -0.2  -0.2;"
                     " -0.1   0.3;"
                     " -0.4   0.1;"
                     "  0.2  -0.1;"
                     "  0.3   0.0;"
                     " -0.3  -0.3;"
                     "  0.1  -0.1;"
                     "  0.2  -0.3;"
                     " -0.3   0.2;"
                     " 10.0  10.0;" // Class 2.
                     " 10.1   9.9;"
                     "  9.9  10.0;"
                     " 10.2   9.7;"
                     " 10.2   9.8;"
                     "  9.7  10.3;"
                     "  9.9  10.1;"
                     "-10.0   5.0;" // Class 3.
                     " -9.8   5.1;"
                     " -9.9   4.9;"
                     "-10.0   4.9;"
                     "-10.2   5.2;"
                     "-10.1   5.1;"
                     "-10.3   5.3;"
                     "-10.0   4.8;"
                     " -9.6   5.0;"
                     " -9.8   5.1;");


// [[Rcpp::export]]
arma::Row<size_t> kmeansDemo() {

    mlpack::kmeans::KMeans<mlpack::metric::EuclideanDistance, 
                           mlpack::kmeans::RandomPartition> kmeans;

    arma::Row<size_t> assignments;
    kmeans.Cluster((arma::mat) trans(kMeansData), 3, assignments);

    return assignments;
}

/*** R
kmeansDemo()
*/

Now, because I am on mlpack 3.4.2 I have to link so I also need to run Sys.setenv("PKG_LIBS"="-lmlpack") -- and I had to adjust the headers slightly from the example I took from the repo where it set up for mlpack 4.1.*.

The link step will vary depending on where you are running this.

But with that, my R session produces the result:

> Sys.setenv("PKG_LIBS"="-lmlpack")
> Rcpp::sourceCpp("~/git/stackoverflow/76319284/answer.cpp")

> kmeansDemo()
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
[1,]    2    2    2    2    2    2    2    2    2     2     2     2     2     1     1     1     1     1     1     1     0     0     0     0     0     0     0     0     0     0
> 

Running kmeans (mlpack 4.0.1)

Things are even better and easier with a (more) current version of mlpack. After I installed 4.0.1 on Ubuntu, the header include simplifiied a little, the namespace changed a litte, I added an R package dependency on RcppEnsmallen (which provides optimization routines). Most importantly, I can build this without linking.

Updated Code (for mlpack 4.0.1)

#include <Rcpp/Rcpp>
#include <mlpack.h>

// Adjusted for mlpack 4.0.1
#include <mlpack/core.hpp>
#include <mlpack/methods/kmeans.hpp>
#include <mlpack/methods/kmeans/random_partition.hpp>
#include <mlpack/methods/neighbor_search/neighbor_search.hpp>

// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(RcppEnsmallen)]]
// [[Rcpp::depends(mlpack)]]
// [[Rcpp::plugins(cpp14)]]

// This is 'borrowed' from mlpack's own src/mlpack/tests/kmeans_test.cpp
// and src/mlpack/tests/kmeans_test.cpp. We borrow the data set, and the
// code from the first test function. Passing data from R in easy thanks
// to RcppArmadillo, 'and left as an exercise'.

// Generate dataset; written transposed because it's easier to read.
arma::mat kMeansData("  0.0   0.0;" // Class 1.
                     "  0.3   0.4;"
                     "  0.1   0.0;"
                     "  0.1   0.3;"
                     " -0.2  -0.2;"
                     " -0.1   0.3;"
                     " -0.4   0.1;"
                     "  0.2  -0.1;"
                     "  0.3   0.0;"
                     " -0.3  -0.3;"
                     "  0.1  -0.1;"
                     "  0.2  -0.3;"
                     " -0.3   0.2;"
                     " 10.0  10.0;" // Class 2.
                     " 10.1   9.9;"
                     "  9.9  10.0;"
                     " 10.2   9.7;"
                     " 10.2   9.8;"
                     "  9.7  10.3;"
                     "  9.9  10.1;"
                     "-10.0   5.0;" // Class 3.
                     " -9.8   5.1;"
                     " -9.9   4.9;"
                     "-10.0   4.9;"
                     "-10.2   5.2;"
                     "-10.1   5.1;"
                     "-10.3   5.3;"
                     "-10.0   4.8;"
                     " -9.6   5.0;"
                     " -9.8   5.1;");


// [[Rcpp::export]]
arma::Row<size_t> kmeansDemo() {

    mlpack::KMeans<mlpack::EuclideanDistance, mlpack::RandomPartition> kmeans;

    arma::Row<size_t> assignments;
    kmeans.Cluster((arma::mat) trans(kMeansData), 3, assignments);

    return assignments;
}

/*** R
kmeansDemo()
*/

It of course builds and runs the same and now includes a small amount of default logging:

> Rcpp::sourceCpp("answer.cpp")

> kmeansDemo()
[INFO ] KMeans::Cluster(): iteration 1, residual 14.8221.
[INFO ] KMeans::Cluster(): iteration 2, residual 1.77636e-15.
[INFO ] KMeans::Cluster(): converged after 2 iterations.
[INFO ] 186 distance calculations.
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
[1,]    2    2    2    2    2    2    2    2    2     2     2     2     2     0     0     0     0     0     0     0     1     1     1     1     1     1     1     1     1     1
> 

Upvotes: 1

Related Questions