Tim P
Tim P

Reputation: 1383

Rcpp: neat way to compare strings derived from an R data frame?

Having some headaches in Rcpp handling strings, have looked at "How to test Rcpp::CharacterVector elements for equality" but the situation is a bit more complex than that.

To illustrate, suppose we have a 200-row data frame of names and marks, generated randomly:

df = data.frame(name = paste("Person",
                             sample(LETTERS[1:10],200,rep=TRUE),sep=""), 
                mark = pmax(pmin(round(rnorm(200,60,15)),100),0), 
                stringsAsFactors=FALSE)

I found that the following inline code (using Rcpp) correctly works out the sum of the marks for all the rows where the person named is the first person given in the data frame (i.e. df$name[1] in R, or equivalently name[0] in the Rcpp code):

library(inline)

fastfunc_good1 <- cxxfunction(
    signature(DFin = "data.frame"),
    plugin = "Rcpp",
    body = '
        Rcpp::DataFrame DF(DFin);
        Rcpp::CharacterVector name = DF["name"];
        Rcpp::IntegerVector mark = DF["mark"];
        Rcpp::CharacterVector targetname(1);
        Rcpp::CharacterVector thisname(1);      

        int n = name.length();
        int tot = 0;
        targetname = name[0];
        std::string s_targetname = as<std::string>(targetname);

        for (int i = 0; i < n; i++) {
            thisname=name[i];
            std::string s_thisname = as<std::string>(thisname);
            if (s_thisname == s_targetname) {
                tot = tot + mark[i];
            }
        }

        return(Rcpp::wrap(tot));
        ')

Now, I really want to simplify this as much as possible, as it's messy to have to define a separate variable to represent the value in name[], coerce to a std::string, and then do the comparison. There must some way of simplifying the notation so it looks more like the following (which it should be noted DOES NOT WORK!)...

fastfunc_bad1 <- cxxfunction(
    signature(DFin = "data.frame"),
    plugin = "Rcpp",
    body = '
        Rcpp::DataFrame DF(DFin);
        Rcpp::CharacterVector name = DF["name"];
        Rcpp::IntegerVector mark = DF["mark"];

        int n = name.length();
        int tot = 0;

        for (int i = 0; i < n; i++) {
            if (name[i] == name[0]) {
                tot = tot + mark[i];
            }
        }

        return(Rcpp::wrap(tot));
        ')

Ultimately the goal of this mini learning project is for me to figure out how to iterate through the unique names in df$name, compute the mark sum for each one, and return everything (the unique names and corresponding sums) as a neat data frame. I've figured out most of the nuts and bolts of how to build and return the final data frame from other examples - it's just the string stuff described above that's causing me headaches. Many thanks in advance for any pointers!

Upvotes: 1

Views: 1657

Answers (1)

Sameer
Sameer

Reputation: 1807

You can use Rcpp::as to convert R objects into C++ containers. The following works for me.

fastfunc_good2 <- cxxfunction(
    signature(DFin = "data.frame"),
    plugin = "Rcpp",
    body = '
        Rcpp::DataFrame DF(DFin);
        std::vector<std::string> name = Rcpp::as<std::vector<std::string> >(DF["name"]);
        std::vector<int> mark = Rcpp::as<std::vector<int> >(DF["mark"]);

        int n = name.size();
        int tot = 0;

        for (int i = 0; i < n; i++) {
            if (name[i] == name[0]) {
                tot = tot + mark[i];
            }
        }

        return(Rcpp::wrap(tot));
        ')


> fastfunc_good1(df)
[1] 1040

> fastfunc_good2(df)
[1] 1040

Upvotes: 5

Related Questions