Reputation: 140669
Suppose I have this object, which is the dput() form of an invalid factor (for instance, printing it will complain about the duplicate level 3):
x <- structure(c(1L, 2L, 3L, 4L), .Label = c("A", "B", "A", "C"),
class = "factor")
What is the best way, using only base R, to convert it to the valid factor
structure(c(1L, 2L, 1L, 3L), .Label = c("A", "B", "C"), class = "factor")
I managed to come up with
factor(levels(x)[x])
but I'm not certain that this will keep working in the future without warnings, and it's probably also quite inefficient (the real factor object that I'm trying to repair is enormous).
Upvotes: 4
Views: 91
Reputation: 51998
Your method seems good, and fairly efficient. To experiment, I created a function to make such malformed factors:
bad.factor <- function(nums,labs){
structure(nums, .Label = labs, class = "factor")}
If you use:
x <- bad.factor(1:1000000,gtools::chr(runif(1000000,65,90)))
Then run:
microbenchmark::microbenchmark(factor(levels(x)[x]))
Typical output is:
Unit: milliseconds
expr min lq mean median uq max neval
factor(levels(x)[x]) 27.72593 32.98346 42.97813 34.11871 35.70919 105.3564 100
Upvotes: 1