Reputation: 40
For demonstration purpose, I have dataset as following:
df <- data.frame(A = as.factor(floor(runif(20,1,6))),
B = as.factor(floor(runif(20,1,6))),
C = as.factor(floor(runif(20,1,6))),
D = c(rep('X',3), rep("Y",7), rep('Z',10)))
How can I iterate through column A, B and C to get counts for
count(df, D, A), count(df, D, B) and count(df, D, C)
This is a simplified version, if I need to do this for 20 or more variables, how can I automate the process?
I have tried:
f <- function(x) count(df, D, x)
result <- bind_rows(lapply(df[ , c('A','B','C')], f))
and I got the following error:
Error in grouped_df_impl(data, unname(vars), drop) :
Column `x` is unknown
Upvotes: 1
Views: 260
Reputation: 4534
Would using tidyr::gather
first work for you so you can do the count
all at once for the different variables? As @alistaire noted in the comments, this can be done using
df %>% gather(key, value, -D) %>% count(D, key, value)
which results in the same output as my unnecessary extra use of group_by
df %>% gather(key, value, -D) %>% group_by(D, key) %>% count(value)
Worked Solution
library(tidyverse)
df %>% gather(key, value, -D) %>% group_by(D, key) %>% count(value)
#> # A tibble: 34 x 4
#> # Groups: D, key [9]
#> D key value n
#> <fctr> <chr> <chr> <int>
#> 1 X A 2 1
#> 2 X A 3 1
#> 3 X A 4 1
#> 4 X B 4 2
#> 5 X B 5 1
#> 6 X C 1 1
#> 7 X C 3 2
#> 8 Y A 1 1
#> 9 Y A 3 3
#> 10 Y A 5 3
#> # ... with 24 more rows
Source data
set.seed(123)
df<-data.frame(A=as.factor(floor(runif(20,1,6))),
B=as.factor(floor(runif(20,1,6))),
C=as.factor(floor(runif(20,1,6))),
D=c(rep('X',3),rep("Y",7),rep('Z',10)))
Upvotes: 2
Reputation: 886938
We can use map2
to do the individual count
of the subset of columns that involve columns other than 'D' with that of 'D'
library(tidyverse)
lst <- map2(names(df)[1:3], names(df)[4], ~count(df[c(.x, .y)],
!!!rlang::syms(c(.x, .y))))
lst
#[[1]]
# A tibble: 11 x 3
# A D n
# <fctr> <fctr> <int>
# 1 1 Z 2
# 2 2 X 1
# 3 2 Y 1
# 4 2 Z 2
# 5 3 X 2
# 6 3 Y 2
# 7 3 Z 4
# 8 4 Y 2
# 9 4 Z 1
#10 5 Y 2
#11 5 Z 1
#[[2]]
# A tibble: 11 x 3
# B D n
# <fctr> <fctr> <int>
# 1 1 Y 2
# 2 1 Z 2
# 3 2 Y 1
# 4 2 Z 1
# 5 3 Y 1
# 6 3 Z 2
# 7 4 X 3
# 8 4 Y 2
# 9 4 Z 3
#10 5 Y 1
#11 5 Z 2
#[[3]]
# A tibble: 12 x 3
# C D n
# <fctr> <fctr> <int>
# 1 1 Y 1
# 2 1 Z 1
# 3 2 X 2
# 4 2 Y 1
# 5 2 Z 4
# 6 3 X 1
# 7 3 Y 2
# 8 3 Z 1
# 9 4 Y 2
#10 4 Z 3
#11 5 Y 1
#12 5 Z 1
It is not clear whether to have a single dataset or a list
of datasets
Upvotes: 1