Similarity matrix to categorical variable

Question

This is probably an easy problem but I have hit a wall.

I have a binary & symmetrical, similarity matrix, and I want to convert it to a categorical variable.

Here is a simple example of the problem

set.seed(100)
a = sample(letters[1:4], size = 9, replace = TRUE)
b = outer(a, a, function(x, y) as.integer(x == y))

This code takes a categorical variable a and converts it to a similarity matrix b.

What I want to do its get from b back to a.

Is this possible?

EDIT

I should note that the category labels don't need to be the same. Just the categories.

MrFlick · Accepted Answer

Well, b has lost all the names, so it won't really be possible to know for sure which group was "a" and which was "b" and so on, but you can cluster those together. Since what you've basically created is an adjacency matrix, you can solve this with help from the igraph package. For example

library(igraph)
graph_from_adjacency_matrix(b, mode = "undirected") %>% 
  clusters() %>% {.$membership}
# [1] 1 1 2 3 1 1 4 1 2

So we can see that the values at positions 1, 2, 5, 6, and 8 are all the same (they were originally "b") and so on. So 1=b, 2=c, 3=a, and 4=d. But again, there would be no way to get the exact same label back from b since swapping the labels in a would result in the same matrix.

Similarity matrix to categorical variable

Answers (1)

Related Questions