Mark
Mark

Reputation: 2899

count by several columns in a lapply

I'm using a line of code like this:

mpg %>% count(~cyl)

but now I'm trying to run the counting (and dozens of other lines of code after it) in a lapply loop.

What I would like to do is counting the data based on several columns.

i.e.:

groupby <- c("cyl", "year", "trans")

lapply(groupby, function(x) { 
mpg %>% count(~x)
})

However, I can't get the mpg %>% count(~x) to work when x is a string like in the loop.

I tried using as.function() in a few ways, but no success. I'm sure one genius here knows the solution faster than me spending 4 hours on google to reinvent the wheel.

Thanks in advance if you know how to get this working!

p.s. my columns to group by are all factors, All other columns are numerical

Upvotes: 2

Views: 253

Answers (2)

Maurits Evers
Maurits Evers

Reputation: 50718

Aside from @akrun's more elegant solution you could also do something like this:

groupby <- c("cyl", "year", "trans");
library(dplyr);
mpg[groupby] %>% 
    gather(key, value) %>% 
    count(key, value)
## A tibble: 16 x 3
#   key   value          n
#   <chr> <chr>      <int>
# 1 cyl   4             81
# 2 cyl   5              4
# 3 cyl   6             79
# 4 cyl   8             70
# 5 trans auto(av)       5
# 6 trans auto(l3)       2
# 7 trans auto(l4)      83
# 8 trans auto(l5)      39
# 9 trans auto(l6)       6
#10 trans auto(s4)       3
#11 trans auto(s5)       3
#12 trans auto(s6)      16
#13 trans manual(m5)    58
#14 trans manual(m6)    19
#15 year  1999         117
#16 year  2008         117

This produces a single data.frame/tibble which you can process further by e.g. grouping entries by key.


Update

The above solution also works on factor levels. For example:

iris[c("Species")] %>% 
    gather(key, value) %>%
    count(key, value)
## A tibble: 3 x 3
#  key     value          n
#  <chr>   <chr>      <int>
#1 Species setosa        50
#2 Species versicolor    50
#3 Species virginica     50

Upvotes: 1

akrun
akrun

Reputation: 887571

We can convert it to symbol with sym from rlang and then evaluate with !!

library(tidyverse)    
map(groupby, ~ 
         mpg %>%
           count(!!rlang::sym(.x)))
#[[1]]
# A tibble: 4 x 2
#    cyl     n
#  <int> <int>
#1     4    81
#2     5     4
#3     6    79
#4     8    70

#[[2]]
# A tibble: 2 x 2
#   year     n
#  <int> <int>
#1  1999   117
#2  2008   117

#[[3]]
# A tibble: 10 x 2
#   trans          n
#   <chr>      <int>
# 1 auto(av)       5
# 2 auto(l3)       2
# 3 auto(l4)      83
# 4 auto(l5)      39
# 5 auto(l6)       6
# 6 auto(s4)       3
# 7 auto(s5)       3
# 8 auto(s6)      16
# 9 manual(m5)    58
#10 manual(m6)    19

There is also an option to use group_by_at with summarise

map(groupby, ~ mpg %>%
                group_by_at(.x) %>% 
                summarise(n = n()))

data

data(mpg)

Upvotes: 2

Related Questions