Nancy Zhu
Nancy Zhu

Reputation: 40

r calculate grouped counts for multiple factor variables

For demonstration purpose, I have dataset as following:

df <- data.frame(A = as.factor(floor(runif(20,1,6))),
                 B = as.factor(floor(runif(20,1,6))),
                 C = as.factor(floor(runif(20,1,6))),
                 D = c(rep('X',3), rep("Y",7), rep('Z',10)))

How can I iterate through column A, B and C to get counts for

count(df, D, A), count(df, D, B) and count(df, D, C)

This is a simplified version, if I need to do this for 20 or more variables, how can I automate the process?

I have tried:

f <- function(x) count(df, D, x)
result <- bind_rows(lapply(df[ , c('A','B','C')], f))

and I got the following error:

 Error in grouped_df_impl(data, unname(vars), drop) : 
 Column `x` is unknown

Upvotes: 1

Views: 260

Answers (2)

markdly
markdly

Reputation: 4534

Would using tidyr::gather first work for you so you can do the count all at once for the different variables? As @alistaire noted in the comments, this can be done using

df %>% gather(key, value, -D) %>% count(D, key, value)

which results in the same output as my unnecessary extra use of group_by

df %>% gather(key, value, -D) %>% group_by(D, key) %>% count(value)

Worked Solution

library(tidyverse)
df %>% gather(key, value, -D) %>% group_by(D, key) %>% count(value)
#> # A tibble: 34 x 4
#> # Groups:   D, key [9]
#>         D   key value     n
#>    <fctr> <chr> <chr> <int>
#>  1      X     A     2     1
#>  2      X     A     3     1
#>  3      X     A     4     1
#>  4      X     B     4     2
#>  5      X     B     5     1
#>  6      X     C     1     1
#>  7      X     C     3     2
#>  8      Y     A     1     1
#>  9      Y     A     3     3
#> 10      Y     A     5     3
#> # ... with 24 more rows

Source data

set.seed(123)
df<-data.frame(A=as.factor(floor(runif(20,1,6))),
               B=as.factor(floor(runif(20,1,6))),
               C=as.factor(floor(runif(20,1,6))),
               D=c(rep('X',3),rep("Y",7),rep('Z',10)))  

Upvotes: 2

akrun
akrun

Reputation: 886938

We can use map2 to do the individual count of the subset of columns that involve columns other than 'D' with that of 'D'

library(tidyverse)
lst <-  map2(names(df)[1:3], names(df)[4], ~count(df[c(.x, .y)], 
                           !!!rlang::syms(c(.x, .y))))


lst
#[[1]]
# A tibble: 11 x 3
#        A      D     n
#   <fctr> <fctr> <int>
# 1      1      Z     2
# 2      2      X     1
# 3      2      Y     1
# 4      2      Z     2
# 5      3      X     2
# 6      3      Y     2
# 7      3      Z     4
# 8      4      Y     2
# 9      4      Z     1
#10      5      Y     2
#11      5      Z     1

#[[2]]
# A tibble: 11 x 3
#        B      D     n
#   <fctr> <fctr> <int>
# 1      1      Y     2
# 2      1      Z     2
# 3      2      Y     1
# 4      2      Z     1
# 5      3      Y     1
# 6      3      Z     2
# 7      4      X     3
# 8      4      Y     2
# 9      4      Z     3
#10      5      Y     1
#11      5      Z     2

#[[3]]
# A tibble: 12 x 3
#        C      D     n
#   <fctr> <fctr> <int>
# 1      1      Y     1
# 2      1      Z     1
# 3      2      X     2
# 4      2      Y     1
# 5      2      Z     4
# 6      3      X     1
# 7      3      Y     2
# 8      3      Z     1
# 9      4      Y     2
#10      4      Z     3
#11      5      Y     1
#12      5      Z     1

It is not clear whether to have a single dataset or a list of datasets

Upvotes: 1

Related Questions