catindri
catindri

Reputation: 384

Similarity indice

Hye,

I would like to calculate a similarity indice, in order to get +1 when rows ar' simialr and -1 when they are not.

  dataR<- read.table(text='
    echant  espece
    ech1    esp1
    ech2    esp2
    ech3    esp2
    ech4    esp3
    ech5    esp3
    ech6    esp4
    ech7    esp4', header=TRUE)

I would like to get a matrix like that (or NA on the diag, it does not really matters) enter image description here

Well I tried proxy package with function simil

library(proxy)    
trst<-read.table("Rtest_simil.csv",header=T,sep=",",dec=".")
    is.numeric(trst[,2])
    as.numeric(trst[,2]) #the column "espece" becomes numeric
    sim<-simil(trst,diag=TRUE)

But the results is not exacty what I wanted. 1) The similarity between ech 2 and 3 for example is 0.5 and the diagonale is 0; when no similarity it is also 0. 2) labels of ech is lost 3)... additionnaly, I cannot save it in a .csv format.

enter image description here

Does anyone has an advice? thanks a lot !

Upvotes: 0

Views: 60

Answers (2)

RHertel
RHertel

Reputation: 23788

The matrix described in the post can be obtained with:

same.mat <- outer(dataR$espece, dataR$espece, "==") * 2 - 1

To assign the names to the columns and rows as described in the post one can use rownames and colnames.

rownames(same.mat) <- colnames(same.mat) <- dataR$echant
> same.mat
#     ech1 ech2 ech3 ech4 ech5 ech6 ech7
#ech1    1   -1   -1   -1   -1   -1   -1
#ech2   -1    1    1   -1   -1   -1   -1
#ech3   -1    1    1   -1   -1   -1   -1
#ech4   -1   -1   -1    1    1   -1   -1
#ech5   -1   -1   -1    1    1   -1   -1
#ech6   -1   -1   -1   -1   -1    1    1
#ech7   -1   -1   -1   -1   -1    1    1

An alternative approach could be:

same.mat <- (as.matrix(dist(as.numeric(dataR$espece)))==0)*2 - 1
rownames(same.mat) <- colnames(same.mat) <- dataR$echant

Upvotes: 2

hrbrmstr
hrbrmstr

Reputation: 78792

There are no doubt more compact ways to do this:

library(tidyr)
same <- function(x) { ifelse(is.na(x), -1, 1) }
spread(dataR, espece, espece) %>% 
  mutate_at(vars(-echant), funs(same))
##   echant esp1 esp2 esp3 esp4
## 1   ech1    1   -1   -1   -1
## 2   ech2   -1    1   -1   -1
## 3   ech3   -1    1   -1   -1
## 4   ech4   -1   -1    1   -1
## 5   ech5   -1   -1    1   -1
## 6   ech6   -1   -1   -1    1
## 7   ech7   -1   -1   -1    1

Upvotes: 1

Related Questions