Ben Engbers
Ben Engbers

Reputation: 453

is.numeric and is.integer give unexpected results

When querying my XML-database, in essence the result comes as a vector with one element per row.

input <- c("[1, 1.05e0, true(), \"1\", false()]", "[2, 4.0e0, true(), \"8\", true()]" more rows)

After converting each element to a list, it is easy to add each element from the vector as a row to a dataframe. The only that remains then is to convert each column to the proper type. My problem is that I don't know how to determine the type for each column.
I start by creating a template based on the first element from the input.

> template <- input[[1]] %>% str_replace_all("[\\[\\]]", "") %>% str_replace_all(", ", ",") %>%
+   str_replace_all("\"", "'") %>% strsplit(",") %>% .[[1]]
> template
[1] "1"       "1.05e0"  "true()"  "'1'"     "false()"

I then use this template to determine the column-type.

test_type <- function(template) {
  Bools <- which(template %in% c("true", "true()", "false", "false()"))
  NonBools <- setdiff(1:length(template), Bools)
  cat("Bools", "\n")
  for (i in Bools) {
    cat(i, "\n")
  }
  cat("NonBools", "\n")
  for (i in NonBools) {
    if (is.numeric(template[[i]])) { Type <- "Num"}
    else if (is.integer(template[[i]])) {Type <- "Int"}
    else {Type <- "Char"}
    cat(i, template[i], Type, "\n", sep = " ")
  }
}

> test_type(template)
Bools 
3 
5 
NonBools 
1 1 Char 
2 1.05e0 Char 
4 '1' Char

As you can see my function does not return the wright type. is.numeric(template[[1]]) returns FALSE but as.numeric(template[[1]]) returns 1. as.numeric(template[[4]]) returns NA

Can someone explain why is.numeric() returns the wrong answer? How can I determine the correct type?

Ben

Upvotes: 0

Views: 330

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389355

We can correct OP's function by using:

test_type <- function(template) {

  Bools <- which(template %in% c("true", "true()", "false", "false()"))
  NonBools <- setdiff(1:length(template), Bools)
  cat("Bools", "\n")
  for (i in Bools) {
    cat(i, "\n")
 }
  cat("NonBools", "\n")
  for (i in NonBools) {
     num <- as.numeric(template[i])
     if (!is.na(num) && num %% 1 != 0)  Type <- "Num"
     else if (!is.na(num) && num %% 1 == 0) Type <- "Int"
     else Type <- "Char"
     cat(i, template[i], Type, "\n", sep = " ")
   }
}


suppressWarnings(test_type(template))

#Bools 
#3 
#5 
#NonBools 
#1 1 Int 
#2 1.05e0 Num 
#4 '1' Char 

Points to note :

  • When we check for is.numeric(template[[i]]), template[[i]] is still character and has not changed its class. So is.numeric would always fail.

  • Integers satisfy as.numeric test. Check class(1L) and is.numeric(1L). So we need some other tests to check for integers.

  • We use here num %% 1 == 0 to test for integers.

Upvotes: 1

morgan121
morgan121

Reputation: 2253

Here is how I would do it using case_when from the dplyr package:

template <- c("1", "1.05e0", "true()", "'1'", "false()")

dplyr::case_when(
  tolower(template) %in% c('true', 'false', 'true()', 'false()')  ~ 'Boolean',
  as.integer(template) == template ~ 'Integer',
  !is.na(as.numeric(template)) ~ 'Numeric',
  TRUE ~ 'Character')

#  "Integer"   "Numeric"   "Boolean"   "Character" "Boolean"  

This could also be done with if/else statements, but I think the case_when syntax is nicer.

I also added in the tolower() for the template to make sure TRUE and FALSE are also counted as boolean

Edit:

The integer one was not working, so now do it a different way

Upvotes: 0

Related Questions