Reputation: 11657
I'd like to be able to find the most frequently occurring level in a factor in a dataset while using dplyr's piping structure. I'm trying to create a new variable that contains the 'modal' factor level when being grouped by another variable.
This is an example of what I'm looking for:
df <- data.frame(cat = stringi::stri_rand_strings(100, 1, '[A-Z]'), num = floor(runif(100, min=0, max=500)))
df <- df %>%
dplyr::group_by(cat) %>%
dplyr::mutate(cat_mode = Mode(num))
Where "Mode" is a function that I'm looking for
Upvotes: 2
Views: 783
Reputation: 626
similar question to Is there a built-in function for finding the mode?
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
df %>%
group_by(cat) %>%
mutate(cat_mode = Mode(num))
# A tibble: 100 x 3
# Groups: cat [26]
cat num cat_mode
<fct> <dbl> <dbl>
1 S 25 25
2 V 86 478
3 R 335 335
4 S 288 25
5 S 330 25
6 Q 384 384
7 C 313 313
8 H 275 275
9 K 274 274
10 J 75 75
# ... with 90 more rows
To see for each factor
df %>%
group_by(cat) %>%
summarise(cat_mode = Mode(num))
A tibble: 26 x 2
cat cat_mode
<fct> <dbl>
1 A 480
2 B 380
3 C 313
4 D 253
5 E 202
6 F 52
7 G 182
8 H 275
9 I 356
10 J 75
# ... with 16 more rows
Upvotes: 1
Reputation: 214927
Use table
to count the items and then use which.max
to find out the most frequent one:
df %>%
group_by(cat) %>%
mutate(cat_mode = names(which.max(table(num)))) %>%
head()
# A tibble: 6 x 3
# Groups: cat [4]
# cat num cat_mode
# <fctr> <dbl> <chr>
#1 Q 305 138
#2 W 34.0 212
#3 R 53.0 53
#4 D 395 5
#5 W 212 212
#6 Q 417 138
# ...
Upvotes: 1