Reputation: 3661
I have a data frame with all factor values
V1 V2 V3
a b c
c b a
c b c
b b a
How can I convert all the values in the data frame to a new one with numeric values (a to 1, b to 2, c to 3, etc ...)
Upvotes: 9
Views: 5763
Reputation: 99321
This approach is similar to Ananda's, but uses unlist()
instead of factor(as.matrix())
. Since all your columns are already factors, unlist()
will combine them into one factor vector with the appropriate levels.
So let's take a look at what happens when we unlist()
your data frame.
unlist(df, use.names = FALSE)
# [1] a c c b b b b b c a c a
# Levels: a b c
Now we can simply run as.integer()
(or c()
) on the above code because the integer values of the factors match your desired mapping. And so the following will revalue your entire data frame.
df[] <- as.integer(unlist(df, use.names = FALSE))
## note that you can also just drop the factor class with c()
## df[] <- c(unlist(df, use.names = FALSE))
df
# V1 V2 V3
# 1 1 2 3
# 2 3 2 1
# 3 3 2 3
# 4 2 2 1
Note: use.names = FALSE
is not necessary. However, dropping the names attribute will make this process more efficient than not.
Data:
df <- structure(list(V1 = structure(c(1L, 3L, 3L, 2L), .Label = c("a",
"b", "c"), class = "factor"), V2 = structure(c(1L, 1L, 1L, 1L
), .Label = "b", class = "factor"), V3 = structure(c(2L, 1L,
2L, 1L), .Label = c("a", "c"), class = "factor")), .Names = c("V1",
"V2", "V3"), class = "data.frame", row.names = c(NA, -4L))
Upvotes: 5
Reputation: 886938
Converting from factor
to numeric
gives the integer values. But, if the factor
columns have levels specified as c('b', 'a', 'c', 'd')
or c('c', 'b', 'a')
, the integer values will be in that order. Just to avoid that, we can specify the levels
by calling the factor
again (safer)
df1[] <- lapply(df1, function(x)
as.numeric(factor(x, levels=letters[1:3])))
If we are using data.table
, one option would be to use set
. It would be more efficient for large datasets. Converting to matrix
may pose memory problems.
library(data.table)
setDT(df1)
for(j in seq_along(df1)){
set(df1, i=NULL, j=j,
value= as.numeric(factor(df1[[j]], levels= letters[1:3])))
}
Upvotes: 10
Reputation: 193507
I would try:
> mydf[] <- as.numeric(factor(as.matrix(mydf)))
> mydf
V1 V2 V3
1 1 2 3
2 3 2 1
3 3 2 3
4 2 2 1
Upvotes: 12