mamatv
mamatv

Reputation: 3661

How to convert a dataframe of factor to numeric?

I have a data frame with all factor values

V1 V2 V3
 a  b  c
 c  b  a
 c  b  c
 b  b  a

How can I convert all the values in the data frame to a new one with numeric values (a to 1, b to 2, c to 3, etc ...)

Upvotes: 9

Views: 5763

Answers (3)

Rich Scriven
Rich Scriven

Reputation: 99321

This approach is similar to Ananda's, but uses unlist() instead of factor(as.matrix()). Since all your columns are already factors, unlist() will combine them into one factor vector with the appropriate levels.

So let's take a look at what happens when we unlist() your data frame.

unlist(df, use.names = FALSE)
#  [1] a c c b b b b b c a c a
# Levels: a b c

Now we can simply run as.integer() (or c()) on the above code because the integer values of the factors match your desired mapping. And so the following will revalue your entire data frame.

df[] <- as.integer(unlist(df, use.names = FALSE))
## note that you can also just drop the factor class with c()
## df[] <- c(unlist(df, use.names = FALSE))
df
#   V1 V2 V3
# 1  1  2  3
# 2  3  2  1
# 3  3  2  3
# 4  2  2  1

Note: use.names = FALSE is not necessary. However, dropping the names attribute will make this process more efficient than not.

Data:

df <- structure(list(V1 = structure(c(1L, 3L, 3L, 2L), .Label = c("a", 
"b", "c"), class = "factor"), V2 = structure(c(1L, 1L, 1L, 1L
), .Label = "b", class = "factor"), V3 = structure(c(2L, 1L, 
2L, 1L), .Label = c("a", "c"), class = "factor")), .Names = c("V1", 
"V2", "V3"), class = "data.frame", row.names = c(NA, -4L))

Upvotes: 5

akrun
akrun

Reputation: 886938

Converting from factor to numericgives the integer values. But, if the factor columns have levels specified as c('b', 'a', 'c', 'd') or c('c', 'b', 'a'), the integer values will be in that order. Just to avoid that, we can specify the levels by calling the factor again (safer)

df1[] <- lapply(df1, function(x) 
                as.numeric(factor(x, levels=letters[1:3])))

If we are using data.table, one option would be to use set. It would be more efficient for large datasets. Converting to matrix may pose memory problems.

library(data.table)
setDT(df1)
for(j in seq_along(df1)){
 set(df1, i=NULL, j=j, 
     value= as.numeric(factor(df1[[j]], levels= letters[1:3])))
 }

Upvotes: 10

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193507

I would try:

> mydf[] <- as.numeric(factor(as.matrix(mydf)))
> mydf
  V1 V2 V3
1  1  2  3
2  3  2  1
3  3  2  3
4  2  2  1

Upvotes: 12

Related Questions