Reputation: 453

is.numeric and is.integer give unexpected results

When querying my XML-database, in essence the result comes as a vector with one element per row.

input <- c("[1, 1.05e0, true(), \"1\", false()]", "[2, 4.0e0, true(), \"8\", true()]" more rows)

After converting each element to a list, it is easy to add each element from the vector as a row to a dataframe. The only that remains then is to convert each column to the proper type. My problem is that I don't know how to determine the type for each column.
I start by creating a template based on the first element from the input.

> template <- input[[1]] %>% str_replace_all("[\\[\\]]", "") %>% str_replace_all(", ", ",") %>%
+   str_replace_all("\"", "'") %>% strsplit(",") %>% .[[1]]
> template
[1] "1"       "1.05e0"  "true()"  "'1'"     "false()"

I then use this template to determine the column-type.

test_type <- function(template) {
  Bools <- which(template %in% c("true", "true()", "false", "false()"))
  NonBools <- setdiff(1:length(template), Bools)
  cat("Bools", "\n")
  for (i in Bools) {
    cat(i, "\n")
  }
  cat("NonBools", "\n")
  for (i in NonBools) {
    if (is.numeric(template[[i]])) { Type <- "Num"}
    else if (is.integer(template[[i]])) {Type <- "Int"}
    else {Type <- "Char"}
    cat(i, template[i], Type, "\n", sep = " ")
  }
}

> test_type(template)
Bools 
3 
5 
NonBools 
1 1 Char 
2 1.05e0 Char 
4 '1' Char

As you can see my function does not return the wright type. is.numeric(template[[1]]) returns FALSE but as.numeric(template[[1]]) returns 1. as.numeric(template[[4]]) returns NA

Can someone explain why is.numeric() returns the wrong answer? How can I determine the correct type?

Ben

Upvotes: 0

Answers (2)

Ronak Shah

Reputation: 389355

We can correct OP's function by using:

test_type <- function(template) {

  Bools <- which(template %in% c("true", "true()", "false", "false()"))
  NonBools <- setdiff(1:length(template), Bools)
  cat("Bools", "\n")
  for (i in Bools) {
    cat(i, "\n")
 }
  cat("NonBools", "\n")
  for (i in NonBools) {
     num <- as.numeric(template[i])
     if (!is.na(num) && num %% 1 != 0)  Type <- "Num"
     else if (!is.na(num) && num %% 1 == 0) Type <- "Int"
     else Type <- "Char"
     cat(i, template[i], Type, "\n", sep = " ")
   }
}


suppressWarnings(test_type(template))

#Bools 
#3 
#5 
#NonBools 
#1 1 Int 
#2 1.05e0 Num 
#4 '1' Char

Points to note :

When we check for is.numeric(template[[i]]), template[[i]] is still character and has not changed its class. So is.numeric would always fail.
Integers satisfy as.numeric test. Check class(1L) and is.numeric(1L). So we need some other tests to check for integers.
We use here num %% 1 == 0 to test for integers.

Upvotes: 1

morgan121

Reputation: 2253

Here is how I would do it using case_when from the dplyr package:

template <- c("1", "1.05e0", "true()", "'1'", "false()")

dplyr::case_when(
  tolower(template) %in% c('true', 'false', 'true()', 'false()')  ~ 'Boolean',
  as.integer(template) == template ~ 'Integer',
  !is.na(as.numeric(template)) ~ 'Numeric',
  TRUE ~ 'Character')

#  "Integer"   "Numeric"   "Boolean"   "Character" "Boolean"

This could also be done with if/else statements, but I think the case_when syntax is nicer.

I also added in the tolower() for the template to make sure TRUE and FALSE are also counted as boolean

Edit:

The integer one was not working, so now do it a different way

Upvotes: 0

is.numeric and is.integer give unexpected results

Answers (2)

Related Questions