gojomoso
gojomoso

Reputation: 163

Apply Multiple Columns to Custom function Using dplyr::mutate(across())

df

a = c("aa", "bb", "cc", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb", "cc", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb","bb") 
b = c("aa", "bb", "cc", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb", "cc", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb","bb") 
c = c("aa", "aa", "aa", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb", "cc", "bb", "bb", "cc","bb", "bb", "cc", "cc", "bb","bb") 
d = c(1, 1, 2, 2, 3, 3, 1, 1, 1, 1, 1, 1, 2, 2, 3, 3, 1, 1, 1, 1, 1)
df = data.frame(a,b,c,d)

Col Names:

cols <- c("a","b","c")

Function:

rare_label <- function(x){
  freq = prop.table(table(unlist(x)))
  make_rare = names(freq)[freq < 0.20]
  lapply(x,
         function(x) {
           replace(x, x %in% make_rare, "Rare")
         })}

Looking to evaluate with dplyr::mutate(across()) the proportion of all values combined in a, b, c and then change any category with a proportion below 20% to "Rare".

Output:

     a    b    c
    Rare Rare Rare
    bb   bb   Rare
    cc   cc   Rare
    bb   bb   bb
    bb   bb   bb
    cc   cc   cc
    bb   bb   bb
    .    .    .
    .    .    .
    .    .    .
    

Using the code below throws an error and I'm not sure why.

df %<>%
  mutate(across(where(cols), ~rare_label(.)

Error: unexpected symbol in: " mutate(across(where(cols), ~rare_label(.) View"

Upvotes: 2

Views: 3823

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388797

Your function is correct but you need to do 2 changes.

  1. Remove lapply and keep the last line as :
replace(x, x %in% make_rare, "Rare")
  1. Remove where from across since you are calling column by their names.

After doing those changes your code should work.


Another option is to use forcats package which has functions to do this kind of thing.

library(dplyr)
library(forcats)

df %>% 
  mutate(across(all_of(cols),fct_lump_min, min = n() * .2, other_level = "rare"))

#      a    b    c d
#1  rare rare rare 1
#2    bb   bb rare 1
#3    cc   cc rare 2
#4    bb   bb   bb 2
#5    bb   bb   bb 3
#6    cc   cc   cc 3
#7    bb   bb   bb 1
#8    bb   bb   bb 1
#9    cc   cc   cc 1
#10   cc   cc   cc 1
#11   bb   bb   bb 1
#12   cc   cc   cc 1
#13   bb   bb   bb 2
#...

fct_lump_min changes all the factors to "rare" which occur less than 20% (0.2 * n()). Here we are passing a number for n to drop levels, I couldn't find a function which works by passing proportion itself, fct_lump_prop does something else.

Upvotes: 2

tmfmnk
tmfmnk

Reputation: 39858

One option could be:

df %>%
 mutate(across(all_of(cols), 
               ~ replace(., . %in% names(which(prop.table(table(.)) < 0.20)), "rare")))

      a    b    c d
1  rare rare rare 1
2    bb   bb rare 1
3    cc   cc rare 2
4    bb   bb   bb 2
5    bb   bb   bb 3
6    cc   cc   cc 3
7    bb   bb   bb 1
8    bb   bb   bb 1
9    cc   cc   cc 1
10   cc   cc   cc 1

If you want to apply an existing function:

fun <- function(x) replace(x, x %in% names(which(prop.table(table(x)) < 0.20)), "rare")

df %>%
 mutate(across(all_of(cols), fun))

Upvotes: 4

Duck
Duck

Reputation: 39585

Your code works well, just change pipe and values like this:

#Code
df %>%
    mutate(across(c(a:c), ~rare_label(.))

Output:

      a    b    c d
1  Rare Rare Rare 1
2    bb   bb Rare 1
3    cc   cc Rare 2
4    bb   bb   bb 2
5    bb   bb   bb 3
6    cc   cc   cc 3
7    bb   bb   bb 1
8    bb   bb   bb 1
9    cc   cc   cc 1
10   cc   cc   cc 1
11   bb   bb   bb 1
12   cc   cc   cc 1
13   bb   bb   bb 2
14   bb   bb   bb 2
15   cc   cc   cc 3
16   bb   bb   bb 3
17   bb   bb   bb 1
18   cc   cc   cc 1
19   cc   cc   cc 1
20   bb   bb   bb 1
21   bb   bb   bb 1

Upvotes: 5

Related Questions