Parseltongue
Parseltongue

Reputation: 11657

Get most frequently occurring factor level in dplyr piping structure

I'd like to be able to find the most frequently occurring level in a factor in a dataset while using dplyr's piping structure. I'm trying to create a new variable that contains the 'modal' factor level when being grouped by another variable.

This is an example of what I'm looking for:

df <- data.frame(cat = stringi::stri_rand_strings(100, 1, '[A-Z]'), num = floor(runif(100, min=0, max=500)))
df <- df %>%
            dplyr::group_by(cat) %>%
            dplyr::mutate(cat_mode = Mode(num))

Where "Mode" is a function that I'm looking for

Upvotes: 2

Views: 783

Answers (2)

Vivek Katial
Vivek Katial

Reputation: 626

similar question to Is there a built-in function for finding the mode?

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

df %>% 
  group_by(cat) %>% 
  mutate(cat_mode = Mode(num))

# A tibble: 100 x 3
# Groups:   cat [26]
   cat     num cat_mode
   <fct> <dbl>    <dbl>
 1 S        25       25
 2 V        86      478
 3 R       335      335
 4 S       288       25
 5 S       330       25
 6 Q       384      384
 7 C       313      313
 8 H       275      275
 9 K       274      274
10 J        75       75
# ... with 90 more rows

To see for each factor

df %>% 
  group_by(cat) %>% 
  summarise(cat_mode = Mode(num))

 A tibble: 26 x 2
   cat   cat_mode
   <fct>    <dbl>
 1 A          480
 2 B          380
 3 C          313
 4 D          253
 5 E          202
 6 F           52
 7 G          182
 8 H          275
 9 I          356
10 J           75
# ... with 16 more rows

Upvotes: 1

akuiper
akuiper

Reputation: 214927

Use table to count the items and then use which.max to find out the most frequent one:

df %>%
    group_by(cat) %>%
    mutate(cat_mode = names(which.max(table(num)))) %>% 
    head()

# A tibble: 6 x 3
# Groups: cat [4]
#  cat      num cat_mode
#  <fctr> <dbl> <chr>   
#1 Q      305   138     
#2 W       34.0 212     
#3 R       53.0 53      
#4 D      395   5       
#5 W      212   212     
#6 Q      417   138  
# ...

Upvotes: 1

Related Questions