SeekingData
SeekingData

Reputation: 115

Count combinations by column, order doesn't matter

dat <- data.frame(A = c("r","t","y","g","r"),
                  B = c("g","r","r","t","y"),
                  C = c("t","g","t","r","t"))

  A B C
1 r g t
2 t r g
3 y r t
4 g t r
5 r y t

I would like to list the characters that occur together across the three columns, ignoring order. e.g.

Combinations  Freq
r t g         3
y t r         2

If I wanted to add a frequency count of a nominal variable (e.g. gender), how might I do that?

e.g.

dat <- data.frame(A = c("r","t","y","g","r"),
                  B = c("g","r","r","t","y"),
                  C = c("t","g","t","r","t"),
             Gender = c("male", "female", "female", "male", "male"))

dat

  A B C Gender
1 r g t   male
2 t r g female
3 y r t female
4 g t r   male
5 r y t   male

To get this:

Combinations  Freq   Male   Female
r t g         3      2       1
y t r         2      1       1

Upvotes: 4

Views: 754

Answers (2)

Frank
Frank

Reputation: 66819

You could do...

data.frame(table(combo = sapply(split(as.matrix(dat), row(dat)), 
  function(x) paste(sort(x), collapse=" "))))

  combo Freq
1 g r t    3
2 r t y    2

For readability, I'd suggest doing it in multiple lines and/or using magrittr:

d = as.matrix(dat)
library(magrittr)

d %>% split(., row(.)) %>% sapply(
  . %>% sort %>% paste(collapse = " ")
) %>% table(combo = .) %>% data.frame

  combo Freq
1 g r t    3
2 r t y    2

Re the edit / new question, I'd take a somewhat different approach, maybe like...

# new example data
dat <- data.frame(A = c("r","t","y","g","r"), B = c("g","r","r","t","y"), C = c("t","g","t","r","t"),Gender = c("male", "female", "female", "male", "male"))

library(data.table)
setDT(dat)

dat[, combo := sapply(transpose(.SD), 
  . %>% sort %>% paste(collapse = " ")), .SDcols=A:C]

dat[, c(
  n = .N, 
  Gender %>% factor(levels=c("male", "female")) %>% table %>% as.list
), by=combo]

   combo n male female
1: g r t 3    2      1
2: r t y 2    1      1

Upvotes: 5

IceCreamToucan
IceCreamToucan

Reputation: 28675

library(tidyverse)

dat %>% 
  pmap_dfr(~list(...)[order(c(...))] %>% set_names(names(dat))) %>%
  group_by_all %>% 
  count

# # A tibble: 2 x 4
# # Groups:   A, B, C [2]
#   A     B     C         n
#   <chr> <chr> <chr> <int>
# 1 g     r     t         3
# 2 r     t     y         2

Upvotes: 2

Related Questions