petyar
petyar

Reputation: 535

Change factor levels in R using a variable for BOTH factor name AND level order in a data frame

I have a large data frame 1 with a lot of columns that are factors. I want to change factor level order for each factor.

I have a lookup data frame 2 for the right factor level orders. This means I can refer to the lookup data frame using a variable for the factor. I can grab the order and put it in a different variable. So far so good.

Simplified example:

d = tibble(
  size = c('small','small','big', NA)
)
d$size = as.factor(d$size)

levels(d$size) # Not what I want.

proper.order = c('small', 'big') # this comes from somewhere else

I can use proper.order to change one column in d.

d$size = factor(d$size, levels = proper.order)

levels(d$size) # What I want.

I want to refer to the column name (size) using a variable.

This doesn't work:

my.column = 'size'

d[names(d) == my.column] = factor(d[names(d) == my.column], levels = proper.order, exclude = NULL)


levels(d$size) # What I want.
d # Not what I want.

I expect to see the factor reordered. This happens. I expect the factor to keep its values (obviously). They are all set to NA.

I suspect this is because d[names(d) == my.column] is a tibble, not a factor. But then why do factor levels change? And how can I reach into the tibble and grab the factor?

Upvotes: 1

Views: 1868

Answers (1)

akrun
akrun

Reputation: 887831

For multiple columns, we can specify in mutate_at

library(dplyr)
d %>% 
   mutate_at(vars(my.column), 
        list(~ factor(., levels = proper.order, exclude = NULL)))

Or with fct_relevel from forcats

library(forcats)
d %>%
    mutate_at(vars(my.column), list(~ fct_relevel(., proper.order))) 

Upvotes: 3

Related Questions