Reputation: 1161
This is probably simple and I feel stupid for asking. I want to change the levels of a factor in a data frame, using mutate. Simple example:
library("dplyr")
dat <- data.frame(x = factor("A"), y = 1)
mutate(dat,levels(x) = "B")
I get:
Error: Unexpected '=' in "mutate(dat,levels(x) ="
Why is this not working? How can I change factor levels with mutate?
Upvotes: 90
Views: 198025
Reputation: 1817
With the forcats package from the tidyverse this is easy, too.
mutate(dat, x = fct_recode(x, "B" = "A"))
Upvotes: 81
Reputation: 1565
From my understanding, the currently accepted answer only changes the order of the factor levels, not the actual labels (i.e., how the levels of the factor are called). To illustrate the difference between levels and labels, consider the following example:
Turn cyl
into factor (specifying levels would not be necessary as they are coded in alphanumeric order):
mtcars2 <- mtcars %>% mutate(cyl = factor(cyl, levels = c(4, 6, 8)))
mtcars2$cyl[1:5]
#[1] 6 6 4 6 8
#Levels: 4 6 8
Change the order of levels (but not the labels itself: cyl is still the same column)
mtcars3 <- mtcars2 %>% mutate(cyl = factor(cyl, levels = c(8, 6, 4)))
mtcars3$cyl[1:5]
#[1] 6 6 4 6 8
#Levels: 8 6 4
all(mtcars3$cyl==mtcars2$cyl)
#[1] TRUE
Assign new labels to cyl
The order of the labels was: c(8, 6, 4), hence we specify new labels as follows:
mtcars4 <- mtcars3 %>% mutate(cyl = factor(cyl, labels = c("new_value_for_8",
"new_value_for_6",
"new_value_for_4" )))
mtcars4$cyl[1:5]
#[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
#Levels: new_value_for_8 new_value_for_6 new_value_for_4
Note how this column differs from our first columns:
all(as.character(mtcars4$cyl)!=mtcars3$cyl)
#[1] TRUE
#Note: TRUE here indicates that all values are unequal because I used != instead of ==
#as.character() was required as the levels were numeric and thus not comparable to a character vector
More details:
If we were to change the levels of cyl
using mtcars2
instead of mtcars3
, we would need to specify the labels differently to get the same result. The order of labels for mtcars2
was: c(4, 6, 8), hence we specify new labels as follows
#change labels of mtcars2 (order used to be: c(4, 6, 8)
mtcars5 <- mtcars2 %>% mutate(cyl = factor(cyl, labels = c("new_value_for_4",
"new_value_for_6",
"new_value_for_8" )))
Unlike mtcars3$cyl
and mtcars4$cyl
, the labels of mtcars4$cyl
and mtcars5$cyl
are thus identical, even though their levels have a different order.
mtcars4$cyl[1:5]
#[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
#Levels: new_value_for_8 new_value_for_6 new_value_for_4
mtcars5$cyl[1:5]
#[1] new_value_for_6 new_value_for_6 new_value_for_4 new_value_for_6 new_value_for_8
#Levels: new_value_for_4 new_value_for_6 new_value_for_8
all(mtcars4$cyl==mtcars5$cyl)
#[1] TRUE
levels(mtcars4$cyl) == levels(mtcars5$cyl)
#1] FALSE TRUE FALSE
Upvotes: 19
Reputation: 1485
You can use the recode
function from dplyr
.
df <- iris %>%
mutate(Species = recode(Species, setosa = "SETOSA",
versicolor = "VERSICOLOR",
virginica = "VIRGINICA"
)
)
Upvotes: 24
Reputation: 713
Can't comment because I don't have enough reputation points, but recode only works on a vector, so the above code in @Stefano's answer should be
df <- iris %>%
mutate(Species = recode(Species,
setosa = "SETOSA",
versicolor = "VERSICOLOR",
virginica = "VIRGINICA")
)
Upvotes: 18
Reputation: 2356
Maybe you are looking for this plyr::revalue function:
mutate(dat, x = revalue(x, c("A" = "B")))
You can see plyr::mapvalues too.
Upvotes: 23
Reputation: 21621
I'm not quite sure I understand your question properly, but if you want to change the factor levels of cyl
with mutate()
you could do:
df <- mtcars %>% mutate(cyl = factor(cyl, levels = c(4, 6, 8)))
You would get:
#> str(df$cyl)
# Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
Upvotes: 56