Reputation: 101
I have the following challenge: dataframe with 218 observations (rows) and 218 variables (cols). The values are either TRUE or FALSE. Now i need to find combinations of variables (cols) that appear (TRUE) in at least 2 rows.
Here is a little example:
data <- data.frame(matrix(FALSE, nrow = 3, ncol = 5))
colnames(data) = paste("item_", 1:5, sep = "")
rownames(data) = paste("Process_", 1:3, sep = "")
data["Process_1",c("item_1","item_2","item_3")] = TRUE
data["Process_2",c("item_2","item_3")] = TRUE
data["Process_3",c("item_1","item_2","item_3","item_4","item_5")] = TRUE
For the example the feasible combinations (or the goal to find out) are the following combinations:
c1: item1,item2,item3
c2: item2,item3
c3: item1, item2
c4: item1, item3
Thank you very much for an answer or a hint :)
Cheers
Upvotes: 2
Views: 43
Reputation: 27732
#all items that have TRUE in 2 or more rows
items <- names(which(colSums(data) >= 2))
# all possible combinations of 2 (or more) items
lapply(2:length(items), function(x) combn(items, x)
# [[1]]
# [,1] [,2] [,3]
# [1,] "item_1" "item_1" "item_2"
# [2,] "item_2" "item_3" "item_3"
#
# [[2]]
# [,1]
# [1,] "item_1"
# [2,] "item_2"
# [3,] "item_3"
Upvotes: 2