Mine
Mine

Reputation: 861

Check whether an element in a character vector can be converted to numeric in R

How can I check whether an element of a character vector can be converted to numeric or not? To be more precise, when the element is a float or an integer it can be converted to numeric without any problems, but when it is a string the warning: “NAs introduced by coercion” occurs. I was able to indirectly check by the index of the NA value. However, it would be much cleaner to be able to do this without getting a warning.

cat1 <- c("1.12354","1.4548","1.9856","some_string")
cat2 <- c("1.45678","1.1478","1.9565","1.32315")
target <- c(0,1,1,0)
df <- data.frame(cat1, cat2, target)
catCols <- c("cat1", "cat2")

for(col in catCols){
a <- as.numeric(unique(df[[col]]))
if(length(which(is.na(a))) != 0){
print(col)
print(which(is.na(a)))
 }
}

Upvotes: 2

Views: 1513

Answers (2)

Rui Barradas
Rui Barradas

Reputation: 76402

A solution is to write a function returning the indices of the NA values to be applied to the columns you want.

check_num <- function(x){
  y <- suppressWarnings(as.numeric(x))
  if(anyNA(y)){
    which(is.na(y))
  } else invisible(NULL)
}
lapply(df[catCols], check_num)
#$cat1
#[1] 4
#
#$cat2
#NULL

The function above returns NULL if all values can be converted to numeric. This next function follows the same method of determining which vector elements can be converted but returns integer(0) if all can be converted.

check_num2 <- function(x){
  y <- suppressWarnings(as.numeric(x))
  which(is.na(y))
}
lapply(df[catCols], check_num2)
#$cat1
#[1] 4
#
#$cat2
#integer(0)

Upvotes: 4

Ronak Shah
Ronak Shah

Reputation: 388862

Perhaps, you can use regex to find if all the values in a column are either an integer or float.

can_convert_to_numeric <- function(x) {
  all(grepl('^(?=.)([+-]?([0-9]*)(\\.([0-9]+))?)$', x, perl = TRUE))  
}

sapply(df[catCols], can_convert_to_numeric)
# cat1  cat2 
#FALSE  TRUE 

Alternatively, to get values that cannot be converted to numeric we can use grep as :

values_which_cannot_be_numeric <- function(x) {
  grep('^(?=.)([+-]?([0-9]*)(\\.([0-9]+))?)$', x, perl = TRUE, invert = TRUE, value = TRUE)
}

lapply(df[catCols], values_which_cannot_be_numeric)

#$cat1
#[1] "some_string"

#$cat2
#character(0)

Regex taken from here.


If you use type.convert you don't have to worry about this at all.

df <- type.convert(df, as.is = TRUE)
str(df)

#'data.frame':  4 obs. of  3 variables:
# $ cat1  : chr  "1.12354" "1.4548" "1.9856" "some_string"
# $ cat2  : num  1.46 1.15 1.96 1.32
# $ target: int  0 1 1 0

Upvotes: 4

Related Questions