Get most frequently occurring factor level in dplyr piping structure

Question

I'd like to be able to find the most frequently occurring level in a factor in a dataset while using dplyr's piping structure. I'm trying to create a new variable that contains the 'modal' factor level when being grouped by another variable.

This is an example of what I'm looking for:

df <- data.frame(cat = stringi::stri_rand_strings(100, 1, '[A-Z]'), num = floor(runif(100, min=0, max=500)))
df <- df %>%
            dplyr::group_by(cat) %>%
            dplyr::mutate(cat_mode = Mode(num))

Where "Mode" is a function that I'm looking for

akuiper · Accepted Answer

Use table to count the items and then use which.max to find out the most frequent one:

df %>%
    group_by(cat) %>%
    mutate(cat_mode = names(which.max(table(num)))) %>% 
    head()

# A tibble: 6 x 3
# Groups: cat [4]
#  cat      num cat_mode
#       
#1 Q      305   138     
#2 W       34.0 212     
#3 R       53.0 53      
#4 D      395   5       
#5 W      212   212     
#6 Q      417   138  
# ...

Get most frequently occurring factor level in dplyr piping structure

Answers (2)

Related Questions