Konrad
Konrad

Reputation: 18585

Simple Rcpp function with try catch returning 'memory not mapped' error

Background

The function has a simple task of iterating over factor elements and attempting to convert each element to double, integer and finally leave it as a character. Upon each count respective counter is increased. At the end string corresponding to the biggest counter is returned.

Rationale

This is mostly a learning example. I've come across a messy data.frame with some data I want to use saved as factors. The variables are in effect doubles, integers or strings. I want to bring them to those types. There are better ways it could be done in base R but this problem looks like a nice opportunity to learn more .

Code

#include <Rcpp.h>

// [[Rcpp::plugins(cpp11)]]

//' @title Guess Vector Type
//'
//' @description Function analyses content of a factor vector and attempts to
//'   guess the correct type.
//'
//' @param x A vector of factor class.
//'
//' @return A scalar string with class name.
//'
//' @export
//'
// [[Rcpp::export]]
Rcpp::String guess_vector_type(Rcpp::IntegerVector x) {

    // Define counters for all types
    int num_doubles = 0;
    int num_integers = 0;
    int num_strings = 0;

    // Converted strings
    double converted_double;
    int converted_integer;


    // Get character vector with levels
    Rcpp::StringVector levels = x.attr("levels");
    // Get integer vector with values
    // Rcpp::String type = x.sexp_type();
    // Returns integer vector type
    // Use iterator: https://teuder.github.io/rcpp4everyone_en/280_iterator.html
    for(Rcpp::IntegerVector::iterator it = x.begin(); it != x.end(); ++it) {
        // Get [] for vector element
        int index = std::distance(x.begin(), it);
        // Get value of a specific vector element
        int element = x[index];
        // Convert to normal string
        std::string temp = Rcpp::as<std::string>(levels[element]);
        // Try converting to an integer
        try
        {
            converted_integer = std::stoi(temp);
        }
        catch(...)
        {
            // Try converting to a doubke
            try
            {
                // Convert to ineteges
                converted_double = std::stod(temp);
            }
            catch(...)
            {
                ++num_integers;
            }
            ++num_doubles;
        }
        ++num_strings;

    }

    // Get max value of three variables
    // https://stackoverflow.com/a/2233412/1655567
    int max_val;
    max_val = num_doubles > num_integers? (num_doubles > num_strings? num_doubles: num_strings): (num_integers > num_strings? num_integers: num_strings);

    // Create results storage
    Rcpp::String res;


    // Check which value is matching max val
    if (max_val == num_doubles) {
        // Most converted to doubles
        res = "double";

    } else if (max_val == num_integers) {
        res = "integer";
    } else {
        res = "character";
    }

    // Return results vector
    return res;
}

Tests

test_factor <- as.factor(rep(letters, 3))

Should return scalar string "character".

Error

guess_vector_type(test_factor)

 *** caught segfault ***
address 0xe1000013, cause 'memory not mapped'

I understand this is similar to the problem discussed here but it's not clear to me where is the mistake.


Updates

Following the comments, I've updated the function:

Rcpp::String guess_vector_type(Rcpp::IntegerVector x) {

    // Define counters for all types
    int num_doubles = 0;
    int num_integers = 0;
    int num_strings = 0;

    // Converted strings
    double converted_double;

    // flag for runnig more tests
    bool is_number;

    // Get character vector with levels
    Rcpp::StringVector levels = x.attr("levels");
    // Get integer vector with values
    // Rcpp::String type = x.sexp_type();
    // Returns integer vector type
    // Use iterator: https://teuder.github.io/rcpp4everyone_en/280_iterator.html
    for(Rcpp::IntegerVector::iterator it = x.begin(); it != x.end(); ++it) {
        // Get [] for vector element
        int index = std::distance(x.begin(), it);
        // Get value of a specific vector element
        int element = x[index];
        // Convert to normal string
        std::string temp = Rcpp::as<std::string>(levels[element - 1]);

        // Reset number checking flag
        is_number = 1;

        // Attempt conversion to double
        try {
            converted_double = std::stod(temp);
            } catch(...) {
                // Conversion failed, increase string count
                ++num_strings;
                // Do not run more test
                is_number = 0;
            }

        // If number run more tests
        if (is_number == 1) {
            // Check if converted string is an integer
            if(floor(converted_double) == converted_double) {
                // Increase counter for integer
                ++num_integers;
            } else {
                // Increase count for doubles
                ++num_doubles;
            }
        }
    }

    // Get max value of three variables
    // https://stackoverflow.com/a/2233412/1655567
    int max_val;
    max_val = num_doubles > num_integers? (num_doubles > num_strings? num_doubles: num_strings): (num_integers > num_strings? num_integers: num_strings);

    // Create results storage
    Rcpp::String res;


    // Check which value is matching max val
    if (max_val == num_doubles) {
        // Most converted to doubles
        res = "double";

    } else if (max_val == num_integers) {
        res = "integer";
    } else {
        res = "character";
    }

    // Return results vector
    return res;
}
Tests
>> guess_vector_type(x = as.factor(letters))
[1] "character"
>> guess_vector_type(as.factor(1:10))
[1] "integer"
>> guess_vector_type(as.factor(runif(n = 1e3)))
[1] "double"

Upvotes: 1

Views: 434

Answers (1)

duckmayr
duckmayr

Reputation: 16930

The problem causing your segfault is with this line

std::string temp = Rcpp::as<std::string>(levels[element]);

Since R is 1-indexed, you need

std::string temp = Rcpp::as<std::string>(levels[element - 1]);

However, I also noticed that you increment your counters in the wrong place (you need to increment string in the innermost catch and integer outside the catches) and need continue statements after the increments (otherwise you end up doing inapplicable increments in addition to the one you want to do). Once you fix those things, the code runs as expected on the test case (but see updates at the end regarding doubles vs. integers).

guess_vector_type(test_factor)
# [1] "character"

Full working code is

#include <Rcpp.h>

// [[Rcpp::plugins(cpp11)]]

//' @title Guess Vector Type
//'
//' @description Function analyses content of a factor vector and attempts to
//'   guess the correct type.
//'
//' @param x A vector of factor class.
//'
//' @return A scalar string with class name.
//'
//' @export
//'
// [[Rcpp::export]]
Rcpp::String guess_vector_type(Rcpp::IntegerVector x) {

    // Define counters for all types
    int num_doubles = 0;
    int num_integers = 0;
    int num_strings = 0;

    // Converted strings
    double converted_double;
    int converted_integer;


    // Get character vector with levels
    Rcpp::StringVector levels = x.attr("levels");
    // Get integer vector with values
    // Rcpp::String type = x.sexp_type();
    // Returns integer vector type
    // Use iterator: https://teuder.github.io/rcpp4everyone_en/280_iterator.html
    for(Rcpp::IntegerVector::iterator it = x.begin(); it != x.end(); ++it) {
        // Get [] for vector element
        int index = std::distance(x.begin(), it);
        // Get value of a specific vector element
        int element = x[index];
        // Convert to normal string
        std::string temp = Rcpp::as<std::string>(levels[element - 1]);
        // Try converting to an integer
        try
        {
            converted_integer = std::stoi(temp);
        }
        catch(...)
        {
            // Try converting to a doubke
            try
            {
                // Convert to ineteges
                converted_double = std::stod(temp);
            }
            catch(...)
            {
                ++num_strings;
                continue;
            }
            ++num_doubles;
            continue;
        }
        ++num_integers;
    }

    // Get max value of three variables
    // https://stackoverflow.com/a/2233412/1655567
    int max_val;
    max_val = num_doubles > num_integers? (num_doubles > num_strings? num_doubles: num_strings): (num_integers > num_strings? num_integers: num_strings);

    // Create results storage
    Rcpp::String res;


    // Check which value is matching max val
    if (max_val == num_doubles) {
        // Most converted to doubles
        res = "double";

    } else if (max_val == num_integers) {
        res = "integer";
    } else {
        res = "character";
    }

    // Return results vector
    return res;
}

Updates

I tried it on some more examples and found that it doesn't work quite as expected for doubles, since the program is able to convert "42.18" to an integer (for example). It does cleanly discern between integers/doubles and characters though:

test_factor <- as.factor(rep(letters, 3))
guess_vector_type(test_factor)
# [1] "character"

test_factor <- as.factor(1:3)
guess_vector_type(test_factor)
# [1] "integer"

test_factor <- as.factor(c(letters, 1))
guess_vector_type(test_factor)
# [1] "character"

test_factor <- as.factor(c(1.234, 42.1138, "a"))
guess_vector_type(test_factor)
# [1] "integer"

In any event, that's an entirely separate issue from the one presented in the question, for which you may want to consult this Stack Overflow post, for example.

Upvotes: 3

Related Questions