Reputation: 453
When querying my XML-database, in essence the result comes as a vector with one element per row.
input <- c("[1, 1.05e0, true(), \"1\", false()]", "[2, 4.0e0, true(), \"8\", true()]" more rows)
After converting each element to a list, it is easy to add each element from the vector as a row to a dataframe. The only that remains then is to convert each column to the proper type. My problem is that I don't know how to determine the type for each column.
I start by creating a template based on the first element from the input.
> template <- input[[1]] %>% str_replace_all("[\\[\\]]", "") %>% str_replace_all(", ", ",") %>%
+ str_replace_all("\"", "'") %>% strsplit(",") %>% .[[1]]
> template
[1] "1" "1.05e0" "true()" "'1'" "false()"
I then use this template to determine the column-type.
test_type <- function(template) {
Bools <- which(template %in% c("true", "true()", "false", "false()"))
NonBools <- setdiff(1:length(template), Bools)
cat("Bools", "\n")
for (i in Bools) {
cat(i, "\n")
}
cat("NonBools", "\n")
for (i in NonBools) {
if (is.numeric(template[[i]])) { Type <- "Num"}
else if (is.integer(template[[i]])) {Type <- "Int"}
else {Type <- "Char"}
cat(i, template[i], Type, "\n", sep = " ")
}
}
> test_type(template)
Bools
3
5
NonBools
1 1 Char
2 1.05e0 Char
4 '1' Char
As you can see my function does not return the wright type. is.numeric(template[[1]])
returns FALSE but as.numeric(template[[1]])
returns 1
. as.numeric(template[[4]])
returns NA
Can someone explain why is.numeric() returns the wrong answer? How can I determine the correct type?
Ben
Upvotes: 0
Views: 330
Reputation: 389355
We can correct OP's function by using:
test_type <- function(template) {
Bools <- which(template %in% c("true", "true()", "false", "false()"))
NonBools <- setdiff(1:length(template), Bools)
cat("Bools", "\n")
for (i in Bools) {
cat(i, "\n")
}
cat("NonBools", "\n")
for (i in NonBools) {
num <- as.numeric(template[i])
if (!is.na(num) && num %% 1 != 0) Type <- "Num"
else if (!is.na(num) && num %% 1 == 0) Type <- "Int"
else Type <- "Char"
cat(i, template[i], Type, "\n", sep = " ")
}
}
suppressWarnings(test_type(template))
#Bools
#3
#5
#NonBools
#1 1 Int
#2 1.05e0 Num
#4 '1' Char
Points to note :
When we check for is.numeric(template[[i]])
, template[[i]]
is still character and has not changed its class. So is.numeric
would always fail.
Integers satisfy as.numeric
test. Check class(1L)
and is.numeric(1L)
. So we need some other tests to check for integers.
We use here num %% 1 == 0
to test for integers.
Upvotes: 1
Reputation: 2253
Here is how I would do it using case_when
from the dplyr
package:
template <- c("1", "1.05e0", "true()", "'1'", "false()")
dplyr::case_when(
tolower(template) %in% c('true', 'false', 'true()', 'false()') ~ 'Boolean',
as.integer(template) == template ~ 'Integer',
!is.na(as.numeric(template)) ~ 'Numeric',
TRUE ~ 'Character')
# "Integer" "Numeric" "Boolean" "Character" "Boolean"
This could also be done with if/else
statements, but I think the case_when
syntax is nicer.
I also added in the tolower()
for the template to make sure TRUE
and FALSE
are also counted as boolean
Edit:
The integer one was not working, so now do it a different way
Upvotes: 0