Danielle
Danielle

Reputation: 783

Convert variables from character to numeric, but excluding one character variable

Problem

Working with a data frame in R, I want to change variables represented as characters into variables represented as numbers (i.e. from class chr to num).

For an entire data set, this is a straightforward problem (different flavors of solutions here, here, here, and here). However, I have one variable that needs to stay as characters.

Example Data

Using this example data (df), let's say I want to change only var1 from class chr to num, leaving "chrOK" as a chr variable. In my real data set, there are many variables to change, so manual approaches like df$var1 = as.numeric(df$var1) is too laborious.

df = data.frame(var1  = c("1","2","3","4"), 
                var2  = c(1,2,3,4),
                chrOK = c("rick", "summer","beth", "morty"),
                stringsAsFactors = FALSE)

str(df)

'data.frame':   4 obs. of  3 variables:
$ var1 : chr  "1" "2" "3" "4"
$ var2 : num  1 2 3 4
$ chrOK: chr  "rick" "summer" "beth" "morty"

Partial Solutions

I've tried a several approaches that seem close, but don't do exactly what I want.

Attempt 1 — introduces NAs

Most of my columns are characters that should be numeric, like "var1". So, using apply() to convert class works. However, this approach fails induces NA values in "chrOK".

df = as.data.frame(apply(df, 2, function(x) as.numeric(x))) 

Warning message:
In FUN(newX[, i], ...) : NAs introduced by coercion

str(df)
'data.frame':   4 obs. of  3 variables:
$ var1 : num  1 2 3 4
$ var2 : num  1 2 3 4
$ chrOK: num  NA NA NA NA

Attempt 2 — split, convert, cbind

Using apply() on the subset of chr variables, excluding "chrOK", doesn't induce NAs, but requires using cbind() to re-include "chrOK".

This solution is not ideal because cbind() results are hard to check for data mutations. (Also, "chrOK" is returned as a factor. Using df = cbind(changed,as.character(unchanged)) doesn't work. [a])

changed = as.data.frame(apply(df[-(which(colnames(df)=="chrOK"))],2,function(x) as.numeric(x)))
unchanged = (df$chrOK)

df = cbind(changed,unchanged)

str(df)
'data.frame':   4 obs. of  3 variables:
$ var1     : num  1 2 3 4
$ var2     : num  1 2 3 4
$ unchanged: Factor w/ 4 levels "beth","morty",..: 3 4 1 2 #[a]

Attempt 3 — correct subset, but error when converting

Using setdiff() I get the subset of chr class variables excluding `"chrOK".

df[setdiff(names(df[sapply(df,is.character)]),"chrOK")]
  var1
1    1
2    2
3    3
4    4

But trying to plug this into an apply function, so that only the subset is changed from chr to num returns an error (see [b]).

 apply(as.numeric(df[setdiff(names(df[sapply(df,is.character)]),"chrOK")]),
       2,function(x) as.numeric(x))

Error in apply(as.numeric(df[setdiff(names(df[sapply(df, is.character)]),  :
(list) object cannot be coerced to type 'double' #[b]

Questions

Upvotes: 3

Views: 4359

Answers (1)

akrun
akrun

Reputation: 887213

We can use type.convert from base R by looping over the columns of the dataset and assign it back to the original object

df[] <- lapply(df, function(x) type.convert(as.character(x), as.is = TRUE))
str(df)
#'data.frame':   4 obs. of  3 variables:
#$ var1 : int  1 2 3 4
#$ var2 : int  1 2 3 4
#$ chrOK: chr  "rick" "summer" "beth" "morty"

The type.convert is calling a C code i.e. C_typeconvert


The reason why the OP's solutions are getting NAs are

1) apply converts the data.frame to matrix and matrix can hold only a single class. Suppose there is a single character element in the matrix, it converts the whole into character.

2) Using as.numeric with apply is problematic as the 'chrOK' is already a character class column. Whenever as.numeric is applied to non-numeric strings, it converts it NA.

3) The OP used the same apply in the second method. It is described as in 1.

Upvotes: 2

Related Questions