Alex
Alex

Reputation: 1304

Indexing in user written functions to iterate over multiple variables using map

I would like to tabulate multiple variables at once.

I have written the following function that works well on a single variable

iris<-iris

tabulate <- function(data,var1){
  data %>%
    group_by({{var1}}) %>%
    summarise(n = n()) %>%
    arrange(-n)%>%
    mutate(totalN = (cumsum(n)),
           percent = round((n / sum(n)), 3),
           cumpercent = round(cumsum(freq = n / sum(n)),3)) 
}
tabulate(iris,Sepal.Length)
# Sepal.Length     n totalN percent cumpercent
# <dbl> <int>  <int>   <dbl>      <dbl>
# 1          5      10     10   0.067      0.067
# 2          5.1     9     19   0.06       0.127
# 3          6.3     9     28   0.06       0.187
# 4          5.7     8     36   0.053      0.24 

I would like to iterate this over a series of variables using map(). I have tried the following, but it gives me an error because it cannot find x.

var<-c("Sepal.Length","Sepal.Width","Petal.Length")
map(var,~tabulate(iris,.x))

I know I can use get(var1) in writing the function, and it will work, but the output is slightly different and makes it harder to understand which variable it is referring to.

tabulate_get <- function(data,var1){
  data %>%
    group_by(get(var1)) %>%
    summarise(n = n()) %>%
    arrange(-n)%>%
    mutate(totalN = (cumsum(n)),
           percent = round((n / sum(n)), 3),
           cumpercent = round(cumsum(freq = n / sum(n)),3)) 
}
map(var,~tabulate_get(iris,.x))

# Note that the output prints (`get(var1)`) rather than the name of the variable used. which makes interpretation harder
# `get(var1)`     n totalN percent cumpercent
# <dbl> <int>  <int>   <dbl>      <dbl>
# 1         5      10     10   0.067      0.067
# 2         5.1     9     19   0.06       0.127
# 3         6.3     9     28   0.06       0.187
# 4         5.7     8     36   0.053      0.24 

Is there a concise way to use map() indexing the variable without using get? Alternatively I could list my variable differently?

thanks a lot for your help

Upvotes: 1

Views: 41

Answers (1)

tmfmnk
tmfmnk

Reputation: 39858

You can do:

tabulate <- function(data, var) {
 data %>%
  group_by(across(all_of(var))) %>%
  summarise(n = n()) %>%
  arrange(-n) %>%
  mutate(totalN = (cumsum(n)),
         percent = round((n/sum(n)), 3),
         cumpercent = round(cumsum(freq = n/sum(n)),3)) 
}

map(.x = var, ~ tabulate(iris, .x))

[[1]]
# A tibble: 5 x 5
  Sepal.Length     n totalN percent cumpercent
         <dbl> <int>  <int>   <dbl>      <dbl>
1          5      10     10   0.067      0.067
2          5.1     9     19   0.06       0.127
3          6.3     9     28   0.06       0.187
4          5.7     8     36   0.053      0.24 
5          6.7     8     44   0.053      0.293

[[2]]
# A tibble: 5 x 5
  Sepal.Width     n totalN percent cumpercent
        <dbl> <int>  <int>   <dbl>      <dbl>
1         3      26     26   0.173      0.173
2         2.8    14     40   0.093      0.267
3         3.2    13     53   0.087      0.353
4         3.4    12     65   0.08       0.433
5         3.1    11     76   0.073      0.507

[[3]]
# A tibble: 5 x 5
  Petal.Length     n totalN percent cumpercent
         <dbl> <int>  <int>   <dbl>      <dbl>
1          1.4    13     13   0.087      0.087
2          1.5    13     26   0.087      0.173
3          4.5     8     34   0.053      0.227
4          5.1     8     42   0.053      0.28 
5          1.3     7     49   0.047      0.327

Upvotes: 2

Related Questions