Reputation: 384
Hye,
I would like to calculate a similarity indice, in order to get +1 when rows ar' simialr and -1 when they are not.
dataR<- read.table(text='
echant espece
ech1 esp1
ech2 esp2
ech3 esp2
ech4 esp3
ech5 esp3
ech6 esp4
ech7 esp4', header=TRUE)
I would like to get a matrix like that (or NA on the diag, it does not really matters)
Well I tried proxy package with function simil
library(proxy)
trst<-read.table("Rtest_simil.csv",header=T,sep=",",dec=".")
is.numeric(trst[,2])
as.numeric(trst[,2]) #the column "espece" becomes numeric
sim<-simil(trst,diag=TRUE)
But the results is not exacty what I wanted. 1) The similarity between ech 2 and 3 for example is 0.5 and the diagonale is 0; when no similarity it is also 0. 2) labels of ech is lost 3)... additionnaly, I cannot save it in a .csv format.
Does anyone has an advice? thanks a lot !
Upvotes: 0
Views: 60
Reputation: 23788
The matrix described in the post can be obtained with:
same.mat <- outer(dataR$espece, dataR$espece, "==") * 2 - 1
To assign the names to the columns and rows as described in the post one can use rownames and colnames.
rownames(same.mat) <- colnames(same.mat) <- dataR$echant
> same.mat
# ech1 ech2 ech3 ech4 ech5 ech6 ech7
#ech1 1 -1 -1 -1 -1 -1 -1
#ech2 -1 1 1 -1 -1 -1 -1
#ech3 -1 1 1 -1 -1 -1 -1
#ech4 -1 -1 -1 1 1 -1 -1
#ech5 -1 -1 -1 1 1 -1 -1
#ech6 -1 -1 -1 -1 -1 1 1
#ech7 -1 -1 -1 -1 -1 1 1
An alternative approach could be:
same.mat <- (as.matrix(dist(as.numeric(dataR$espece)))==0)*2 - 1
rownames(same.mat) <- colnames(same.mat) <- dataR$echant
Upvotes: 2
Reputation: 78792
There are no doubt more compact ways to do this:
library(tidyr)
same <- function(x) { ifelse(is.na(x), -1, 1) }
spread(dataR, espece, espece) %>%
mutate_at(vars(-echant), funs(same))
## echant esp1 esp2 esp3 esp4
## 1 ech1 1 -1 -1 -1
## 2 ech2 -1 1 -1 -1
## 3 ech3 -1 1 -1 -1
## 4 ech4 -1 -1 1 -1
## 5 ech5 -1 -1 1 -1
## 6 ech6 -1 -1 -1 1
## 7 ech7 -1 -1 -1 1
Upvotes: 1