RuddThreeTrees
RuddThreeTrees

Reputation: 101

Finding feasible combinations in dataframe R / combinatorics

I have the following challenge: dataframe with 218 observations (rows) and 218 variables (cols). The values are either TRUE or FALSE. Now i need to find combinations of variables (cols) that appear (TRUE) in at least 2 rows.

Here is a little example:

data <- data.frame(matrix(FALSE, nrow = 3, ncol = 5))
colnames(data) = paste("item_", 1:5, sep = "")
rownames(data) = paste("Process_", 1:3, sep = "")
data["Process_1",c("item_1","item_2","item_3")] = TRUE
data["Process_2",c("item_2","item_3")] = TRUE
data["Process_3",c("item_1","item_2","item_3","item_4","item_5")] = TRUE

For the example the feasible combinations (or the goal to find out) are the following combinations:

c1: item1,item2,item3

c2: item2,item3

c3: item1, item2

c4: item1, item3

Thank you very much for an answer or a hint :)

Cheers

Upvotes: 2

Views: 43

Answers (1)

Wimpel
Wimpel

Reputation: 27732

#all items that have TRUE in 2 or more rows
items <- names(which(colSums(data) >= 2))
# all possible combinations of 2 (or more) items
lapply(2:length(items), function(x) combn(items, x)
# [[1]]
#          [,1]     [,2]     [,3]    
# [1,] "item_1" "item_1" "item_2"
# [2,] "item_2" "item_3" "item_3"
# 
# [[2]]
#          [,1]    
# [1,] "item_1"
# [2,] "item_2"
# [3,] "item_3"

Upvotes: 2

Related Questions