Reputation: 75
I am trying to convert all factor variables to numeric variables in a large data frame. While converting, variable labels (elaborative name of variables) are lost in the new data frame. Is there any easy way to covert factor variables into numeric variables in a data frame without losing variable. The sample code is given below. Thank you.
v1 <- c('1','4','5')
v2 <- c('21000', '23400', '26800')
v3 <- c('2010','2008','2007')
data <- data.frame(v1, v2, v3)
library(Hmisc)
label(data$v1) <- "Number"
label (data$v2) <- "Value"
label (data$v3) <- "Year"
data[] <- as.numeric(factor(as.matrix(data)))
View(data)
Upvotes: 2
Views: 191
Reputation: 72994
You could save the attributes beforehand and restore them.
## save labels
attr.data <- lapply(dat, attr, "label")
## convert to numeric and restore labels
dat[] <- Map(function(x, y) `attr<-`(as.numeric(levels(x))[x], "label", y), dat, attr.data)
In one step:
dat[] <- Map(function(x, y)
`attr<-`(as.numeric(levels(x))[x], "label", y), dat, Map(attr, dat, "label"))
The labels are stored in attributes (try attributes(data)
) and can be accessed with attr
and their names. The name of label attributes is "label"
and we can catch them during conversion. Map
handles columns and attributes in a corresponding manner to ensure that the correct labels are assigned.
dat
# v1 v2 v3
# 1 1 21000 2010
# 2 4 23400 2008
# 3 5 26800 2007
str(dat)
# 'data.frame': 3 obs. of 3 variables:
# $ v1: num 1 4 5
# ..- attr(*, "label")= chr "Number"
# $ v2: num 21000 23400 26800
# ..- attr(*, "label")= chr "Value"
# $ v3: num 2010 2008 2007
# ..- attr(*, "label")= chr "Year"
Data
dat <- structure(list(v1 = structure(1:3, .Label = c("1", "4", "5"), class = c("labelled",
"factor"), label = "Number"), v2 = structure(1:3, .Label = c("21000",
"23400", "26800"), class = c("labelled", "factor"), label = "Value"),
v3 = structure(3:1, .Label = c("2007", "2008", "2010"), class = c("labelled",
"factor"), label = "Year")), row.names = c(NA, -3L), class = "data.frame")
Sidenote: I use dat
rather than data
here, because data
is already occupied from R to load specific datasets.
Upvotes: 1