Ratnanil
Ratnanil

Reputation: 1752

Test all values against each other and form groups from resulting matrix

I feel as if I'm asking the wrong questions and trying to reinvent the wheel. What am I missing?

I have a bunch of values, lets say 8, that I need to test against each other. I have built a function that returns a matrix stating whether any two values are in a group or not. For the lack of a better idea, let me paste the output here:

    data.text <- 
"1     2     3     4     5     6     7     8
1  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
2  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
3  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
4 FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
5 FALSE FALSE FALSE FALSE  TRUE  TRUE    NA FALSE
6 FALSE FALSE FALSE FALSE  TRUE  TRUE    NA FALSE
7 FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
8 FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE"

data <- read.table(text=data.text, header = TRUE)
data <- as.matrix(data)
colnames(data) <- 1:8

So the row 1 says that value 1 is in a group with itself (column 1) and with value 2 and 3, but not with values 4 - 8. Values 5 and 6 are within the same group as well.

I am trying to use this information to create individual group IDs and a vector of all elements in that group:

What I've done so far:

# row and column index for all TRUE values by row
groups <- which(data,arr.ind = T)

# sort each row in acending order in order to find duplicate values
groups.sorted  <- t(apply(groups,1,sort))

# drop double statments ("1 and 2", "2 and 1")
groups.unique <- unique(groups.sorted)

# drop obivous information ("1 and 1")
groups.real <- groups.unique[groups.unique[,1] != groups.unique[,2],]

At this point I'm stuck. How do I automate the fact that rows 1, 2 and 3 belong to the same group?

All in all, I feel I'm going at this rather clumsily. Can anybody point me to a more elegant way?

Upvotes: 4

Views: 107

Answers (3)

Cath
Cath

Reputation: 24074

Another way, using base R:

groups <- unique(lapply(apply(data, 2, which), unique))
names(groups) <- seq(length(groups))
groups
#$`1`
#[1] 1 2 3

#$`2`
#[1] 4

#$`3`
#[1] 5 6

#$`4`
#[1] 7

#$`5`
#[1] 8

If you want to get the group indices of each element, you can do it with stack:

stack(groups)
#  values ind
#1      1   1
#2      2   1
#3      3   1
#4      4   2
#5      5   3
#6      6   3
#7      7   4
#8      8   5

Upvotes: 3

nicola
nicola

Reputation: 24480

I'd use the igraph package for this sort of things:

require(igraph)
components(graph_from_adjacency_matrix(data))$membership
#1 2 3 4 5 6 7 8 
#1 1 1 2 3 3 4 5

You obtain a named vector whose names are the elements and the values are the group they belong.

Upvotes: 6

Ven Yao
Ven Yao

Reputation: 3710

It's actually a question on graph.

library(igraph)

graph.dat <- graph.data.frame(which(data, arr.ind=T), directed=F)
V(graph.dat)$label <- V(graph.dat)$name
V(graph.dat)$degree <- degree(graph.dat)
clusters(graph.dat, mode="weak")$membership
# 1 2 3 4 5 6 7 8 
# 1 1 1 2 3 3 4 5 

Upvotes: 4

Related Questions