Reputation: 1752
I feel as if I'm asking the wrong questions and trying to reinvent the wheel. What am I missing?
I have a bunch of values, lets say 8, that I need to test against each other. I have built a function that returns a matrix stating whether any two values are in a group or not. For the lack of a better idea, let me paste the output here:
data.text <-
"1 2 3 4 5 6 7 8
1 TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
2 TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
3 TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
4 FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
5 FALSE FALSE FALSE FALSE TRUE TRUE NA FALSE
6 FALSE FALSE FALSE FALSE TRUE TRUE NA FALSE
7 FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
8 FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE"
data <- read.table(text=data.text, header = TRUE)
data <- as.matrix(data)
colnames(data) <- 1:8
So the row 1 says that value 1 is in a group with itself (column 1) and with value 2 and 3, but not with values 4 - 8. Values 5 and 6 are within the same group as well.
I am trying to use this information to create individual group IDs and a vector of all elements in that group:
What I've done so far:
# row and column index for all TRUE values by row
groups <- which(data,arr.ind = T)
# sort each row in acending order in order to find duplicate values
groups.sorted <- t(apply(groups,1,sort))
# drop double statments ("1 and 2", "2 and 1")
groups.unique <- unique(groups.sorted)
# drop obivous information ("1 and 1")
groups.real <- groups.unique[groups.unique[,1] != groups.unique[,2],]
At this point I'm stuck. How do I automate the fact that rows 1, 2 and 3 belong to the same group?
All in all, I feel I'm going at this rather clumsily. Can anybody point me to a more elegant way?
Upvotes: 4
Views: 107
Reputation: 24074
Another way, using base R:
groups <- unique(lapply(apply(data, 2, which), unique))
names(groups) <- seq(length(groups))
groups
#$`1`
#[1] 1 2 3
#$`2`
#[1] 4
#$`3`
#[1] 5 6
#$`4`
#[1] 7
#$`5`
#[1] 8
If you want to get the group indices of each element, you can do it with stack
:
stack(groups)
# values ind
#1 1 1
#2 2 1
#3 3 1
#4 4 2
#5 5 3
#6 6 3
#7 7 4
#8 8 5
Upvotes: 3
Reputation: 24480
I'd use the igraph
package for this sort of things:
require(igraph)
components(graph_from_adjacency_matrix(data))$membership
#1 2 3 4 5 6 7 8
#1 1 1 2 3 3 4 5
You obtain a named vector whose names are the elements and the values are the group they belong.
Upvotes: 6
Reputation: 3710
It's actually a question on graph.
library(igraph)
graph.dat <- graph.data.frame(which(data, arr.ind=T), directed=F)
V(graph.dat)$label <- V(graph.dat)$name
V(graph.dat)$degree <- degree(graph.dat)
clusters(graph.dat, mode="weak")$membership
# 1 2 3 4 5 6 7 8
# 1 1 1 2 3 3 4 5
Upvotes: 4