Check whether an element in a character vector can be converted to numeric in R

Question

How can I check whether an element of a character vector can be converted to numeric or not? To be more precise, when the element is a float or an integer it can be converted to numeric without any problems, but when it is a string the warning: “NAs introduced by coercion” occurs. I was able to indirectly check by the index of the NA value. However, it would be much cleaner to be able to do this without getting a warning.

cat1 <- c("1.12354","1.4548","1.9856","some_string")
cat2 <- c("1.45678","1.1478","1.9565","1.32315")
target <- c(0,1,1,0)
df <- data.frame(cat1, cat2, target)
catCols <- c("cat1", "cat2")

for(col in catCols){
a <- as.numeric(unique(df[[col]]))
if(length(which(is.na(a))) != 0){
print(col)
print(which(is.na(a)))
 }
}

Ronak Shah · Accepted Answer

Perhaps, you can use regex to find if all the values in a column are either an integer or float.

can_convert_to_numeric <- function(x) {
  all(grepl('^(?=.)([+-]?([0-9]*)(\.([0-9]+))?)$', x, perl = TRUE))  
}

sapply(df[catCols], can_convert_to_numeric)
# cat1  cat2 
#FALSE  TRUE

Alternatively, to get values that cannot be converted to numeric we can use grep as :

values_which_cannot_be_numeric <- function(x) {
  grep('^(?=.)([+-]?([0-9]*)(\.([0-9]+))?)$', x, perl = TRUE, invert = TRUE, value = TRUE)
}

lapply(df[catCols], values_which_cannot_be_numeric)

#$cat1
#[1] "some_string"

#$cat2
#character(0)

Regex taken from here.

If you use type.convert you don't have to worry about this at all.

df <- type.convert(df, as.is = TRUE)
str(df)

#'data.frame':  4 obs. of  3 variables:
# $ cat1  : chr  "1.12354" "1.4548" "1.9856" "some_string"
# $ cat2  : num  1.46 1.15 1.96 1.32
# $ target: int  0 1 1 0

Check whether an element in a character vector can be converted to numeric in R

Answers (2)

Related Questions