Reputation: 1443
I'd like to relevel a factor variable based on the value of another variable. For instance:
factors <- structure(list(color = c("RED", "GREEN", "BLUE", "YELLOW", "BROWN"
), count = c(2, 5, 11, 1, 19)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
> factors
# A tibble: 5 x 2
color count
<chr> <dbl>
1 RED 2
2 GREEN 5
3 BLUE 11
4 YELLOW 1
5 BROWN 19
Here's what I want to produce:
##Group all levels with count < 10 into "OTHER"
> factors.out
# A tibble: 3 x 2
color count
<chr> <dbl>
1 OTHER 8
2 BLUE 11
3 BROWN 19
I thought this was a job for forcats::fct_lump()
:
##Keep 3 levels
factors %>%
+ mutate(color = fct_lump(color, n = 3))
# A tibble: 5 x 2
color count
<fct> <dbl>
1 RED 2
2 GREEN 5
3 BLUE 11
4 YELLOW 1
5 BROWN 19
I know one can do that with something like:
factors %>%
mutate(color = ifelse(count < 10, "OTHER", color)) %>%
group_by(color) %>%
summarise(count = sum(count))
But I thought or was hoping there was a convenience function in forcats
.
Upvotes: 1
Views: 679
Reputation:
Because you already have a data.frame containing factors and counts, you can use the counts as weights when lumping together the most rare observations. The second stage just involves collapsing the OTHER category like in your example.
factors <- structure(list(color = c("RED", "GREEN", "BLUE", "YELLOW", "BROWN"),
count = c(2, 5, 11, 1, 19)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
library("dplyr")
library("forcats")
factors.out <- factors %>%
mutate(color = fct_lump(color, n = 2, other_level = "OTHER",
w = count)) %>%
group_by(color) %>%
summarise(count = sum(count)) %>%
arrange(count)
giving
factors.out
# A tibble: 3 x 2
color count
<fct> <dbl>
1 OTHER 8
2 BLUE 11
3 BROWN 19
Upvotes: 2