Reputation: 861
How can I check whether an element of a character vector can be converted to numeric or not? To be more precise, when the element is a float or an integer it can be converted to numeric without any problems, but when it is a string the warning: “NAs introduced by coercion”
occurs. I was able to indirectly check by the index of the NA value. However, it would be much cleaner to be able to do this without getting a warning.
cat1 <- c("1.12354","1.4548","1.9856","some_string")
cat2 <- c("1.45678","1.1478","1.9565","1.32315")
target <- c(0,1,1,0)
df <- data.frame(cat1, cat2, target)
catCols <- c("cat1", "cat2")
for(col in catCols){
a <- as.numeric(unique(df[[col]]))
if(length(which(is.na(a))) != 0){
print(col)
print(which(is.na(a)))
}
}
Upvotes: 2
Views: 1513
Reputation: 76402
A solution is to write a function returning the indices of the NA
values to be applied to the columns you want.
check_num <- function(x){
y <- suppressWarnings(as.numeric(x))
if(anyNA(y)){
which(is.na(y))
} else invisible(NULL)
}
lapply(df[catCols], check_num)
#$cat1
#[1] 4
#
#$cat2
#NULL
The function above returns NULL
if all values can be converted to numeric. This next function follows the same method of determining which vector elements can be converted but returns integer(0)
if all can be converted.
check_num2 <- function(x){
y <- suppressWarnings(as.numeric(x))
which(is.na(y))
}
lapply(df[catCols], check_num2)
#$cat1
#[1] 4
#
#$cat2
#integer(0)
Upvotes: 4
Reputation: 388862
Perhaps, you can use regex to find if all the values in a column are either an integer or float.
can_convert_to_numeric <- function(x) {
all(grepl('^(?=.)([+-]?([0-9]*)(\\.([0-9]+))?)$', x, perl = TRUE))
}
sapply(df[catCols], can_convert_to_numeric)
# cat1 cat2
#FALSE TRUE
Alternatively, to get values that cannot be converted to numeric we can use grep
as :
values_which_cannot_be_numeric <- function(x) {
grep('^(?=.)([+-]?([0-9]*)(\\.([0-9]+))?)$', x, perl = TRUE, invert = TRUE, value = TRUE)
}
lapply(df[catCols], values_which_cannot_be_numeric)
#$cat1
#[1] "some_string"
#$cat2
#character(0)
Regex taken from here.
If you use type.convert
you don't have to worry about this at all.
df <- type.convert(df, as.is = TRUE)
str(df)
#'data.frame': 4 obs. of 3 variables:
# $ cat1 : chr "1.12354" "1.4548" "1.9856" "some_string"
# $ cat2 : num 1.46 1.15 1.96 1.32
# $ target: int 0 1 1 0
Upvotes: 4