Melissa Salazar
Melissa Salazar

Reputation: 565

Assign partial factor levels in R

I want to assign factor levels but I'm not always going to know all of the values so I want to make sure that only a few factors are at the beginning if they are present. Example would be I want strawberries to be factor level 1 and kiwis be factor level 2 and all the rest get assigned alphabetically

data <- data.frame(
  
  parameter = c(rep("apple",3), rep("banana", 3), rep("strawberry", 3), rep("kiwi", 3)),
  date = c(rep(c(as.Date('2021-01-01'), as.Date('2021-01-02'), as.Date('2021-01-03')), 4)),
  value = c(0,2,3,0,0,1,2,3,4,0,0,0)
)

If I were to order by parameter it would go strawberry, kiwi, apple, then banana. Unfortunately I won't always know what the other factors may be. Sometimes it may be apple and banana, or it could be apple, bananas, and pears. The possibilities are endless.

If you need extra context, a user will upload a csv to a shiny app with the 3 columns but the parameters could be different for ever user. If strawberry and kiwi are present in the parameters they need to be assigned a factor level first and all other factors assigned alphabetically.

Thanks in advance!

Upvotes: 1

Views: 424

Answers (1)

akrun
akrun

Reputation: 887118

We can use setdiff to change the order in levels of factor

v1 <- c('strawberry', 'kiwi')
data$parameter <- with(data, droplevels(factor(parameter,
          levels = c(v1, sort(setdiff(parameter, v1))))))

levels(data$parameter)
#[1] "strawberry" "kiwi"       "apple"      "banana"   

NOTE: It may be better to wrap with droplevels (in case the 'strawberry' or 'kiwi' is not present in the data).

The above code may look perplexing. The logic is

  • setdiff - returns the unique elements of the column without the values in 'v1'
  • sort- the elements (in default alphabetic order)
  • c - concatenate the 'v1' elements at the start in the vector
  • levels- specify the unique sorted vector as levels argument in factor
  • assign the factor to the original column
  • droplevels - remove unused levels in case the elements in 'v1' are not present

Or another option is fct_relevel

library(forcats)
data$parameter <- fct_relevel(data$parameter, v1)

If we need to use tidyverse, just copy the code within the with and specify it in mutate

library(dplyr)
data <- data %>%
         mutate(parameter = droplevels(factor(parameter,
          levels = c(v1, sort(setdiff(parameter, v1))))))

Upvotes: 1

Related Questions