Reputation: 18585
The function has a simple task of iterating over factor elements and attempting to convert each element to double, integer and finally leave it as a character. Upon each count respective counter is increased. At the end string corresponding to the biggest counter is returned.
This is mostly a learning example. I've come across a messy data.frame with some data I want to use saved as factors. The variables are in effect doubles, integers or strings. I want to bring them to those types. There are better ways it could be done in base R but this problem looks like a nice opportunity to learn more rcpp.
#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
//' @title Guess Vector Type
//'
//' @description Function analyses content of a factor vector and attempts to
//' guess the correct type.
//'
//' @param x A vector of factor class.
//'
//' @return A scalar string with class name.
//'
//' @export
//'
// [[Rcpp::export]]
Rcpp::String guess_vector_type(Rcpp::IntegerVector x) {
// Define counters for all types
int num_doubles = 0;
int num_integers = 0;
int num_strings = 0;
// Converted strings
double converted_double;
int converted_integer;
// Get character vector with levels
Rcpp::StringVector levels = x.attr("levels");
// Get integer vector with values
// Rcpp::String type = x.sexp_type();
// Returns integer vector type
// Use iterator: https://teuder.github.io/rcpp4everyone_en/280_iterator.html
for(Rcpp::IntegerVector::iterator it = x.begin(); it != x.end(); ++it) {
// Get [] for vector element
int index = std::distance(x.begin(), it);
// Get value of a specific vector element
int element = x[index];
// Convert to normal string
std::string temp = Rcpp::as<std::string>(levels[element]);
// Try converting to an integer
try
{
converted_integer = std::stoi(temp);
}
catch(...)
{
// Try converting to a doubke
try
{
// Convert to ineteges
converted_double = std::stod(temp);
}
catch(...)
{
++num_integers;
}
++num_doubles;
}
++num_strings;
}
// Get max value of three variables
// https://stackoverflow.com/a/2233412/1655567
int max_val;
max_val = num_doubles > num_integers? (num_doubles > num_strings? num_doubles: num_strings): (num_integers > num_strings? num_integers: num_strings);
// Create results storage
Rcpp::String res;
// Check which value is matching max val
if (max_val == num_doubles) {
// Most converted to doubles
res = "double";
} else if (max_val == num_integers) {
res = "integer";
} else {
res = "character";
}
// Return results vector
return res;
}
test_factor <- as.factor(rep(letters, 3))
Should return scalar string "character"
.
guess_vector_type(test_factor)
*** caught segfault ***
address 0xe1000013, cause 'memory not mapped'
I understand this is similar to the problem discussed here but it's not clear to me where is the mistake.
Following the comments, I've updated the function:
Rcpp::String guess_vector_type(Rcpp::IntegerVector x) {
// Define counters for all types
int num_doubles = 0;
int num_integers = 0;
int num_strings = 0;
// Converted strings
double converted_double;
// flag for runnig more tests
bool is_number;
// Get character vector with levels
Rcpp::StringVector levels = x.attr("levels");
// Get integer vector with values
// Rcpp::String type = x.sexp_type();
// Returns integer vector type
// Use iterator: https://teuder.github.io/rcpp4everyone_en/280_iterator.html
for(Rcpp::IntegerVector::iterator it = x.begin(); it != x.end(); ++it) {
// Get [] for vector element
int index = std::distance(x.begin(), it);
// Get value of a specific vector element
int element = x[index];
// Convert to normal string
std::string temp = Rcpp::as<std::string>(levels[element - 1]);
// Reset number checking flag
is_number = 1;
// Attempt conversion to double
try {
converted_double = std::stod(temp);
} catch(...) {
// Conversion failed, increase string count
++num_strings;
// Do not run more test
is_number = 0;
}
// If number run more tests
if (is_number == 1) {
// Check if converted string is an integer
if(floor(converted_double) == converted_double) {
// Increase counter for integer
++num_integers;
} else {
// Increase count for doubles
++num_doubles;
}
}
}
// Get max value of three variables
// https://stackoverflow.com/a/2233412/1655567
int max_val;
max_val = num_doubles > num_integers? (num_doubles > num_strings? num_doubles: num_strings): (num_integers > num_strings? num_integers: num_strings);
// Create results storage
Rcpp::String res;
// Check which value is matching max val
if (max_val == num_doubles) {
// Most converted to doubles
res = "double";
} else if (max_val == num_integers) {
res = "integer";
} else {
res = "character";
}
// Return results vector
return res;
}
Tests
>> guess_vector_type(x = as.factor(letters))
[1] "character"
>> guess_vector_type(as.factor(1:10))
[1] "integer"
>> guess_vector_type(as.factor(runif(n = 1e3)))
[1] "double"
Upvotes: 1
Views: 434
Reputation: 16930
The problem causing your segfault is with this line
std::string temp = Rcpp::as<std::string>(levels[element]);
Since R is 1-indexed, you need
std::string temp = Rcpp::as<std::string>(levels[element - 1]);
However, I also noticed that you increment your counters in the wrong place (you need to increment string in the innermost catch and integer outside the catches) and need continue statements after the increments (otherwise you end up doing inapplicable increments in addition to the one you want to do). Once you fix those things, the code runs as expected on the test case (but see updates at the end regarding doubles vs. integers).
guess_vector_type(test_factor)
# [1] "character"
Full working code is
#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
//' @title Guess Vector Type
//'
//' @description Function analyses content of a factor vector and attempts to
//' guess the correct type.
//'
//' @param x A vector of factor class.
//'
//' @return A scalar string with class name.
//'
//' @export
//'
// [[Rcpp::export]]
Rcpp::String guess_vector_type(Rcpp::IntegerVector x) {
// Define counters for all types
int num_doubles = 0;
int num_integers = 0;
int num_strings = 0;
// Converted strings
double converted_double;
int converted_integer;
// Get character vector with levels
Rcpp::StringVector levels = x.attr("levels");
// Get integer vector with values
// Rcpp::String type = x.sexp_type();
// Returns integer vector type
// Use iterator: https://teuder.github.io/rcpp4everyone_en/280_iterator.html
for(Rcpp::IntegerVector::iterator it = x.begin(); it != x.end(); ++it) {
// Get [] for vector element
int index = std::distance(x.begin(), it);
// Get value of a specific vector element
int element = x[index];
// Convert to normal string
std::string temp = Rcpp::as<std::string>(levels[element - 1]);
// Try converting to an integer
try
{
converted_integer = std::stoi(temp);
}
catch(...)
{
// Try converting to a doubke
try
{
// Convert to ineteges
converted_double = std::stod(temp);
}
catch(...)
{
++num_strings;
continue;
}
++num_doubles;
continue;
}
++num_integers;
}
// Get max value of three variables
// https://stackoverflow.com/a/2233412/1655567
int max_val;
max_val = num_doubles > num_integers? (num_doubles > num_strings? num_doubles: num_strings): (num_integers > num_strings? num_integers: num_strings);
// Create results storage
Rcpp::String res;
// Check which value is matching max val
if (max_val == num_doubles) {
// Most converted to doubles
res = "double";
} else if (max_val == num_integers) {
res = "integer";
} else {
res = "character";
}
// Return results vector
return res;
}
I tried it on some more examples and found that it doesn't work quite as expected for doubles, since the program is able to convert "42.18" to an integer (for example). It does cleanly discern between integers/doubles and characters though:
test_factor <- as.factor(rep(letters, 3))
guess_vector_type(test_factor)
# [1] "character"
test_factor <- as.factor(1:3)
guess_vector_type(test_factor)
# [1] "integer"
test_factor <- as.factor(c(letters, 1))
guess_vector_type(test_factor)
# [1] "character"
test_factor <- as.factor(c(1.234, 42.1138, "a"))
guess_vector_type(test_factor)
# [1] "integer"
In any event, that's an entirely separate issue from the one presented in the question, for which you may want to consult this Stack Overflow post, for example.
Upvotes: 3