Rohan Bali
Rohan Bali

Reputation: 119

Identifying similar items from a matrix in r

I have a matrix like this:

a <- c(0,45,19,48,28,19,0,0,62,3,61,62,0,0,0,63,29,0,0,0,0,0,62,63,0,0,0,0,0,29,0,0,0,0,0,0)
mat1 <- matrix(a,6,6,byrow = TRUE)
mat1
> mat1
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0   45   19   48   28   19
[2,]    0    0   62    3   61   62
[3,]    0    0    0   63   29    0
[4,]    0    0    0    0   62   63
[5,]    0    0    0    0    0   29
[6,]    0    0    0    0    0    0

Now, if any cell has a value less than 30, it means that the corresponding row and column are the same/similar. For example [1,3] is 19, hence we say they are similar.

So for each row, we calculate the combinations that are similar(i.e less than 30 in the cell).

Row 1 : [1,3],[1,5],[1,6]

Row 2 : [2,4]

Row 3 : [3,6]

Row 4 : 0

Row 5: [5,6]

Row 6 : 0

So the total similar combinations are [1,3], [1,5], [1,6], [2,4], [3,6] and [5,6]. The result should show the total similar combinations without the transitive items which mean the total similar items should be only 2 because [1,3] , [1,5] , [1,6] , [5,6],[3,6] are the same/similar, so the count of these should be 1 and combinations [2,4] should be 1. Hence, total same/similar are 2 for this matrix.

There are multiple matrices of order nxm hence the solution desired should be dynamic according to the number of rows and columns.

Upvotes: 1

Views: 118

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101753

Here is a igraph option, hope it works for your purpose (I guess you are looking for the number of clusters)

library(igraph)

components(
  graph_from_data_frame(
    data.frame(which(mat1 != 0 & mat1 < 30, arr.ind = TRUE))
  )
)$no

which gives

[1] 2

Details

  1. create data frame for indices
> data.frame(which(mat1 != 0 & mat1 < 30, arr.ind = TRUE))
  row col
1   1   3
2   2   4
3   1   5
4   3   5
5   1   6
6   5   6
  1. construct graph
g <- graph_from_data_frame(
  data.frame(which(mat1 != 0 & mat1 < 30, arr.ind = TRUE))
)
plot(g)

enter image description here

  1. View clusters
> components(g)
$membership
1 2 3 5 4 6
1 2 1 1 2 1

$csize
[1] 4 2

$no
[1] 2

Upvotes: 1

rdodhia
rdodhia

Reputation: 350

This will output a list of combinations.

x=data.table(which(mat1<30 & mat1>0,arr.ind=T))
setkey(x,row)
x=x[!(row==col)]

s=list()
for(j in unique(x$row)){
  s[j]=list(NULL)
  temp=x[row==j,col]
  for(i in temp){
    s[[j]]=cbind(s[[j]],c(j,i))
    for(k in x[row==i,col])
      if(k %in% c(temp,j)) s[[j]]=cbind(s[[j]],c(i,k))
    x=x[!(row==i & col %in% c(temp,j))]
}}

s

Upvotes: 1

Related Questions