Reputation: 535
I have a large data frame 1 with a lot of columns that are factors. I want to change factor level order for each factor.
I have a lookup data frame 2 for the right factor level orders. This means I can refer to the lookup data frame using a variable for the factor. I can grab the order and put it in a different variable. So far so good.
Simplified example:
d = tibble(
size = c('small','small','big', NA)
)
d$size = as.factor(d$size)
levels(d$size) # Not what I want.
proper.order = c('small', 'big') # this comes from somewhere else
I can use proper.order
to change one column in d.
d$size = factor(d$size, levels = proper.order)
levels(d$size) # What I want.
I want to refer to the column name (size
) using a variable.
This doesn't work:
my.column = 'size'
d[names(d) == my.column] = factor(d[names(d) == my.column], levels = proper.order, exclude = NULL)
levels(d$size) # What I want.
d # Not what I want.
I expect to see the factor reordered. This happens. I expect the factor to keep its values (obviously). They are all set to NA.
I suspect this is because d[names(d) == my.column]
is a tibble, not a factor. But then why do factor levels change? And how can I reach into the tibble and grab the factor?
Upvotes: 1
Views: 1868
Reputation: 887831
For multiple columns, we can specify in mutate_at
library(dplyr)
d %>%
mutate_at(vars(my.column),
list(~ factor(., levels = proper.order, exclude = NULL)))
Or with fct_relevel
from forcats
library(forcats)
d %>%
mutate_at(vars(my.column), list(~ fct_relevel(., proper.order)))
Upvotes: 3