Reputation: 867

Is there a better way to do a group_by for each value in a list?

I am trying to find the best way to iterate through each column of a data frame, group by that column, and produce a summary. Here is my attempt:

library(tidyverse)
data = data.frame(
  a = sample(LETTERS[1:3], 100, replace=TRUE),
  b = sample(LETTERS[1:8], 100, replace=TRUE),
  c = sample(LETTERS[3:15], 100, replace=TRUE),
  d = sample(LETTERS[16:26], 100, replace=TRUE),
  value = rnorm(100)
)

myfunction <- function(x) {
  groupVars <- select_if(x, is.factor) %>% colnames()
  results <- list()
  for(i in 1:length(groupVars)) {
  results[[i]] <- x %>%
    group_by_at(.vars = vars(groupVars[i])) %>%
    summarise(
      n = n()
    ) 
  }
  return(results)
}

test <- myfunction(data)

The function returns:

[[1]]
# A tibble: 3 x 2
  a         n
  <fct> <int>
1 A        37
2 B        34
3 C        29
...
...
...

My question is, is this the best way to do this? Is there a way to avoid using a for loop? Can I use purrr and map somehow to do this?

Thank you

Upvotes: 0

Answers (3)

Maurits Evers

Reputation: 50738

An option is to use map

library(tidyverse)
map(data[1:4], ~data.frame(x = {{.x}}) %>% count(x))
#$a
## A tibble: 3 x 2
#  x         n
#  <fct> <int>
#1 A        39
#2 B        32
#3 C        29
#
#$b
## A tibble: 8 x 2
#  x         n
#  <fct> <int>
#1 A        14
#2 B        11
#3 C        16
#4 D        10
#5 E        12
#6 F        10
#7 G        13
#8 H        14
#...

The output is a list. Note that I have ignored the last column of data, as it doesn't seem to be relevant here.

If you want columns in the list data.frames to be named according to the columns from your original data, we can use imap

imap(data[1:4], ~tibble(!!.y := {{.x}}) %>% count(!!sym(.y)))
#$a
## A tibble: 3 x 2
#  a         n
#  <fct> <int>
#1 A        23
#2 B        35
#3 C        42
#
#$b
## A tibble: 8 x 2
#  b         n
#  <fct> <int>
#1 A        15
#2 B        10
#3 C        13
#4 D         5
#5 E        19
#6 F         9
#7 G        13
#8 H        16
#...

Or making use of tibble::enframe (thanks @camille)

imap(data[1:4], ~enframe(.x, value = .y) %>% count(!!sym(.y)))

Upvotes: 2

Vitali Avagyan

Reputation: 1203

You can simply call:

apply(data, 2,table)

You can drop the last list element if you want.

Upvotes: 0

Calum You

Reputation: 15072

You could reshape the data and group by both the column and the letter. This gives you one dataframe instead of a list of them, but you could get the list if you really want it with split.

set.seed(123)
library(tidyverse)
data = data.frame(
  a = sample(LETTERS[1:3], 100, replace=TRUE),
  b = sample(LETTERS[1:8], 100, replace=TRUE),
  c = sample(LETTERS[3:15], 100, replace=TRUE),
  d = sample(LETTERS[16:26], 100, replace=TRUE),
  value = rnorm(100)
)

data %>%
  pivot_longer(cols = -value, names_to = "column", values_to = "letter") %>%
  group_by(column, letter) %>%
  summarise(n = n())
#> # A tibble: 35 x 3
#> # Groups:   column [4]
#>    column letter     n
#>    <chr>  <fct>  <int>
#>  1 a      A         33
#>  2 a      B         32
#>  3 a      C         35
#>  4 b      A          8
#>  5 b      B         11
#>  6 b      C         12
#>  7 b      D         14
#>  8 b      E          8
#>  9 b      F         17
#> 10 b      G         16
#> # … with 25 more rows

^{Created on 2019-10-30 by the reprex package (v0.3.0)}

Upvotes: 1

Is there a better way to do a group_by for each value in a list?

Answers (3)

Related Questions