Reputation:
When doing data analysis, I sometimes need to recode values to factors in order to carry out groups analysis. I want to keep the order of factor same as the order of conversion specified in case_when
. In this case, the order should be "Excellent" "Good" "Fail"
. How can I achieve this without tediously mention it again as in levels=c('Excellent', 'Good', 'Fail')
?
Thank you very much.
library(dplyr, warn.conflicts = FALSE)
set.seed(1234)
score <- runif(100, min = 0, max = 100)
Performance <- function(x) {
case_when(
is.na(x) ~ NA_character_,
x > 80 ~ 'Excellent',
x > 50 ~ 'Good',
TRUE ~ 'Fail'
) %>% factor(levels=c('Excellent', 'Good', 'Fail'))
}
performance <- Performance(score)
levels(performance)
#> [1] "Excellent" "Good" "Fail"
table(performance)
#> performance
#> Excellent Good Fail
#> 15 30 55
Upvotes: 24
Views: 10615
Reputation: 706
Let case_when()
output numbers and use the labels
argument in factor()
:
library(dplyr, warn.conflicts = FALSE)
set.seed(1234)
score <- runif(100, min = 0, max = 100)
Performance <- function(x) {
case_when(
is.na(x) ~ NA_real_,
x > 80 ~ 1,
x > 50 ~ 2,
TRUE ~ 3
) %>% factor(labels=c('Excellent', 'Good', 'Fail'))
}
performance <- Performance(score)
levels(performance)
#> [1] "Excellent" "Good" "Fail"
table(performance)
#> performance
#> Excellent Good Fail
#> 15 30 55
Created on 2023-01-13 with reprex v2.0.2
Upvotes: 2
Reputation:
Finally, I came up with a solution. For those who are interested, here is my solution. I wrote a function fct_case_when
(pretend being a function in forcats
). It is just a wrapper of case_when
with factor output. The order of levels is the same as the argument order.
fct_case_when <- function(...) {
args <- as.list(match.call())
levels <- sapply(args[-1], function(f) f[[3]]) # extract RHS of formula
levels <- levels[!is.na(levels)]
factor(dplyr::case_when(...), levels=levels)
}
Now, I can use fct_case_when
in place of case_when
, and the result will be the same as the previous implementation (but less tedious).
Performance <- function(x) {
fct_case_when(
is.na(x) ~ NA_character_,
x > 80 ~ 'Excellent',
x > 50 ~ 'Good',
TRUE ~ 'Fail'
)
}
performance <- Performance(score)
levels(performance)
#> [1] "Excellent" "Good" "Fail"
table(performance)
#> performance
#> Excellent Good Fail
#> 15 30 55
Upvotes: 13
Reputation: 113
This is an implementation I have been using:
library(dplyr)
library(purrr)
library(rlang)
library(forcats)
factored_case_when <- function(...) {
args <- list2(...)
rhs <- map(args, f_rhs)
cases <- case_when(
!!!args
)
exec(fct_relevel, cases, !!!rhs)
}
numbers <- c(2, 7, 4, 3, 8, 9, 3, 5, 2, 7, 5, 4, 1, 9, 8)
factored_case_when(
numbers <= 2 ~ "Very small",
numbers <= 3 ~ "Small",
numbers <= 6 ~ "Medium",
numbers <= 8 ~ "Large",
TRUE ~ "Huge!"
)
#> [1] Very small Large Medium Small Large Huge!
#> [7] Small Medium Very small Large Medium Medium
#> [13] Very small Huge! Large
#> Levels: Very small Small Medium Large Huge!
This has the advantage of not having to manually spoecify the factor levels.
I have also submitted a feature request to dplyr for this functionality: https://github.com/tidyverse/dplyr/issues/6029
Upvotes: 1
Reputation: 712
While my solution replaces your piping with a messy intermediate variable, this works:
library(dplyr, warn.conflicts = FALSE)
set.seed(1234)
score <- runif(100, min = 0, max = 100)
Performance <- function(x) {
t <- case_when(
is.na(x) ~ NA_character_,
x > 80 ~ 'Excellent',
x > 50 ~ 'Good',
TRUE ~ 'Fail'
)
to <- subset(t, !duplicated(t))
factor(t, levels=(to[order(subset(x, !duplicated(t)), decreasing=T)] ))
}
performance <- Performance(score)
levels(performance)
Edited to fix!
Upvotes: 1
Reputation: 7630
levels are set in lexicographic order by default. If you don't want to specify them, you can set them up so that lexicographic order is correct (Performance1
), or create a levels
vector once, and use it when generating the factor and when setting the levels (Performance2
). I don't know how much effort or tediousness either of these would save you, but here they are. Take a look at my 3rd recommendation for what I think would be the least tedious way.
Performance1 <- function(x) {
case_when(
is.na(x) ~ NA_character_,
x > 80 ~ 'Excellent',
x <= 50 ~ 'Fail',
TRUE ~ 'Good',
) %>% factor()
}
Performance2 <- function(x, levels = c("Excellent", "Good", "Fail")){
case_when(
is.na(x) ~ NA_character_,
x > 80 ~ levels[1],
x > 50 ~ levels[2],
TRUE ~ levels[3]
) %>% factor(levels)
}
performance1 <- Performance1(score)
levels(performance1)
# [1] "Excellent" "Fail" "Good"
table(performance1)
# performance1
# Excellent Fail Good
# 15 55 30
performance2 <- Performance2(score)
levels(performance2)
# [1] "Excellent" "Good" "Fail"
table(performance2)
# performance2
# Excellent Good Fail
# 15 30 55
If I could suggest an even less tedious way:
performance <- cut(score, breaks = c(0, 50, 80, 100),
labels = c("Fail", "Good", "Excellent"))
levels(performance)
# [1] "Fail" "Good" "Excellent"
table(performance)
# performance
# Fail Good Excellent
# 55 30 15
Upvotes: 4