dom_oh
dom_oh

Reputation: 867

Is there a better way to do a group_by for each value in a list?

I am trying to find the best way to iterate through each column of a data frame, group by that column, and produce a summary. Here is my attempt:

library(tidyverse)
data = data.frame(
  a = sample(LETTERS[1:3], 100, replace=TRUE),
  b = sample(LETTERS[1:8], 100, replace=TRUE),
  c = sample(LETTERS[3:15], 100, replace=TRUE),
  d = sample(LETTERS[16:26], 100, replace=TRUE),
  value = rnorm(100)
)

myfunction <- function(x) {
  groupVars <- select_if(x, is.factor) %>% colnames()
  results <- list()
  for(i in 1:length(groupVars)) {
  results[[i]] <- x %>%
    group_by_at(.vars = vars(groupVars[i])) %>%
    summarise(
      n = n()
    ) 
  }
  return(results)
}

test <- myfunction(data)

The function returns:

[[1]]
# A tibble: 3 x 2
  a         n
  <fct> <int>
1 A        37
2 B        34
3 C        29
...
...
...

My question is, is this the best way to do this? Is there a way to avoid using a for loop? Can I use purrr and map somehow to do this?

Thank you

Upvotes: 0

Views: 129

Answers (3)

Maurits Evers
Maurits Evers

Reputation: 50678

An option is to use map

library(tidyverse)
map(data[1:4], ~data.frame(x = {{.x}}) %>% count(x))
#$a
## A tibble: 3 x 2
#  x         n
#  <fct> <int>
#1 A        39
#2 B        32
#3 C        29
#
#$b
## A tibble: 8 x 2
#  x         n
#  <fct> <int>
#1 A        14
#2 B        11
#3 C        16
#4 D        10
#5 E        12
#6 F        10
#7 G        13
#8 H        14
#...

The output is a list. Note that I have ignored the last column of data, as it doesn't seem to be relevant here.


If you want columns in the list data.frames to be named according to the columns from your original data, we can use imap

imap(data[1:4], ~tibble(!!.y := {{.x}}) %>% count(!!sym(.y)))
#$a
## A tibble: 3 x 2
#  a         n
#  <fct> <int>
#1 A        23
#2 B        35
#3 C        42
#
#$b
## A tibble: 8 x 2
#  b         n
#  <fct> <int>
#1 A        15
#2 B        10
#3 C        13
#4 D         5
#5 E        19
#6 F         9
#7 G        13
#8 H        16
#...

Or making use of tibble::enframe (thanks @camille)

imap(data[1:4], ~enframe(.x, value = .y) %>% count(!!sym(.y)))

Upvotes: 2

Vitali Avagyan
Vitali Avagyan

Reputation: 1203

You can simply call:

apply(data, 2,table)

You can drop the last list element if you want.

Upvotes: 0

Calum You
Calum You

Reputation: 15072

You could reshape the data and group by both the column and the letter. This gives you one dataframe instead of a list of them, but you could get the list if you really want it with split.

set.seed(123)
library(tidyverse)
data = data.frame(
  a = sample(LETTERS[1:3], 100, replace=TRUE),
  b = sample(LETTERS[1:8], 100, replace=TRUE),
  c = sample(LETTERS[3:15], 100, replace=TRUE),
  d = sample(LETTERS[16:26], 100, replace=TRUE),
  value = rnorm(100)
)

data %>%
  pivot_longer(cols = -value, names_to = "column", values_to = "letter") %>%
  group_by(column, letter) %>%
  summarise(n = n())
#> # A tibble: 35 x 3
#> # Groups:   column [4]
#>    column letter     n
#>    <chr>  <fct>  <int>
#>  1 a      A         33
#>  2 a      B         32
#>  3 a      C         35
#>  4 b      A          8
#>  5 b      B         11
#>  6 b      C         12
#>  7 b      D         14
#>  8 b      E          8
#>  9 b      F         17
#> 10 b      G         16
#> # … with 25 more rows

Created on 2019-10-30 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions