Reputation: 867
I am trying to find the best way to iterate through each column of a data frame, group by that column, and produce a summary. Here is my attempt:
library(tidyverse)
data = data.frame(
a = sample(LETTERS[1:3], 100, replace=TRUE),
b = sample(LETTERS[1:8], 100, replace=TRUE),
c = sample(LETTERS[3:15], 100, replace=TRUE),
d = sample(LETTERS[16:26], 100, replace=TRUE),
value = rnorm(100)
)
myfunction <- function(x) {
groupVars <- select_if(x, is.factor) %>% colnames()
results <- list()
for(i in 1:length(groupVars)) {
results[[i]] <- x %>%
group_by_at(.vars = vars(groupVars[i])) %>%
summarise(
n = n()
)
}
return(results)
}
test <- myfunction(data)
The function returns:
[[1]]
# A tibble: 3 x 2
a n
<fct> <int>
1 A 37
2 B 34
3 C 29
...
...
...
My question is, is this the best way to do this? Is there a way to avoid using a for loop? Can I use purrr and map somehow to do this?
Thank you
Upvotes: 0
Views: 129
Reputation: 50678
An option is to use map
library(tidyverse)
map(data[1:4], ~data.frame(x = {{.x}}) %>% count(x))
#$a
## A tibble: 3 x 2
# x n
# <fct> <int>
#1 A 39
#2 B 32
#3 C 29
#
#$b
## A tibble: 8 x 2
# x n
# <fct> <int>
#1 A 14
#2 B 11
#3 C 16
#4 D 10
#5 E 12
#6 F 10
#7 G 13
#8 H 14
#...
The output is a list
. Note that I have ignored the last column of data
, as it doesn't seem to be relevant here.
If you want columns in the list
data.frame
s to be named according to the columns from your original data
, we can use imap
imap(data[1:4], ~tibble(!!.y := {{.x}}) %>% count(!!sym(.y)))
#$a
## A tibble: 3 x 2
# a n
# <fct> <int>
#1 A 23
#2 B 35
#3 C 42
#
#$b
## A tibble: 8 x 2
# b n
# <fct> <int>
#1 A 15
#2 B 10
#3 C 13
#4 D 5
#5 E 19
#6 F 9
#7 G 13
#8 H 16
#...
Or making use of tibble::enframe
(thanks @camille)
imap(data[1:4], ~enframe(.x, value = .y) %>% count(!!sym(.y)))
Upvotes: 2
Reputation: 1203
You can simply call:
apply(data, 2,table)
You can drop the last list element if you want.
Upvotes: 0
Reputation: 15072
You could reshape the data and group by both the column and the letter. This gives you one dataframe instead of a list of them, but you could get the list if you really want it with split
.
set.seed(123)
library(tidyverse)
data = data.frame(
a = sample(LETTERS[1:3], 100, replace=TRUE),
b = sample(LETTERS[1:8], 100, replace=TRUE),
c = sample(LETTERS[3:15], 100, replace=TRUE),
d = sample(LETTERS[16:26], 100, replace=TRUE),
value = rnorm(100)
)
data %>%
pivot_longer(cols = -value, names_to = "column", values_to = "letter") %>%
group_by(column, letter) %>%
summarise(n = n())
#> # A tibble: 35 x 3
#> # Groups: column [4]
#> column letter n
#> <chr> <fct> <int>
#> 1 a A 33
#> 2 a B 32
#> 3 a C 35
#> 4 b A 8
#> 5 b B 11
#> 6 b C 12
#> 7 b D 14
#> 8 b E 8
#> 9 b F 17
#> 10 b G 16
#> # … with 25 more rows
Created on 2019-10-30 by the reprex package (v0.3.0)
Upvotes: 1