Reputation: 115
dat <- data.frame(A = c("r","t","y","g","r"),
B = c("g","r","r","t","y"),
C = c("t","g","t","r","t"))
A B C
1 r g t
2 t r g
3 y r t
4 g t r
5 r y t
I would like to list the characters that occur together across the three columns, ignoring order. e.g.
Combinations Freq
r t g 3
y t r 2
If I wanted to add a frequency count of a nominal variable (e.g. gender), how might I do that?
e.g.
dat <- data.frame(A = c("r","t","y","g","r"),
B = c("g","r","r","t","y"),
C = c("t","g","t","r","t"),
Gender = c("male", "female", "female", "male", "male"))
dat
A B C Gender
1 r g t male
2 t r g female
3 y r t female
4 g t r male
5 r y t male
To get this:
Combinations Freq Male Female
r t g 3 2 1
y t r 2 1 1
Upvotes: 4
Views: 754
Reputation: 66819
You could do...
data.frame(table(combo = sapply(split(as.matrix(dat), row(dat)),
function(x) paste(sort(x), collapse=" "))))
combo Freq
1 g r t 3
2 r t y 2
For readability, I'd suggest doing it in multiple lines and/or using magrittr:
d = as.matrix(dat)
library(magrittr)
d %>% split(., row(.)) %>% sapply(
. %>% sort %>% paste(collapse = " ")
) %>% table(combo = .) %>% data.frame
combo Freq
1 g r t 3
2 r t y 2
Re the edit / new question, I'd take a somewhat different approach, maybe like...
# new example data
dat <- data.frame(A = c("r","t","y","g","r"), B = c("g","r","r","t","y"), C = c("t","g","t","r","t"),Gender = c("male", "female", "female", "male", "male"))
library(data.table)
setDT(dat)
dat[, combo := sapply(transpose(.SD),
. %>% sort %>% paste(collapse = " ")), .SDcols=A:C]
dat[, c(
n = .N,
Gender %>% factor(levels=c("male", "female")) %>% table %>% as.list
), by=combo]
combo n male female
1: g r t 3 2 1
2: r t y 2 1 1
Upvotes: 5
Reputation: 28675
library(tidyverse)
dat %>%
pmap_dfr(~list(...)[order(c(...))] %>% set_names(names(dat))) %>%
group_by_all %>%
count
# # A tibble: 2 x 4
# # Groups: A, B, C [2]
# A B C n
# <chr> <chr> <chr> <int>
# 1 g r t 3
# 2 r t y 2
Upvotes: 2