Reputation: 71
I have 5 different vectors and then a vector I want to compare them to. What I need is to get the most similiar vector out of the 5 different ones.
The vectors are quite long, so I will just show a little of it:
# Vector to compare to:
v1 <- c(0.2500, 0.4375, 0.1250, 0.3125, 0.0000, 0.5625, 0.1250, 0.1875, 0.1875, 0.1875, 0.1875)
# One of vectors to compare
v2 <- c(2, 1, 0, 1, 1, 1, 1, 0, 2, 1, 2)
# Another of vectors to compare:
v3 <- c(5, 0, 3, 1, 1, 2, 1, 2, 0, 1, 2)
Practically what I need to do is a statistical test to compare the distribution of histograms given by those vectors and tell which is the closest. I tried to use ks.test
, but it had a problem with duplicate values in vectors and p-value returned was like 0.0000000000001.. Any ideas how to do that (except visually)?
Upvotes: 2
Views: 2092
Reputation: 226172
It's not clear to me why you need a statistical test if all you want to do is compute which one is closest. Below I'm just computing the histograms directly and comparing their distances.
Generate data:
v1 <- c(0.2500, 0.4375, 0.1250, 0.3125, 0.0000, 0.5625, 0.1250,
0.1875, 0.1875, 0.1875, 0.1875)
v2 <- c(2, 1, 0, 1, 1, 1, 1, 0, 2, 1, 2)*0.1
v3 <- c(5, 0, 3, 1, 1, 2, 1, 2, 0, 1, 2)*0.1
Note that I changed vectors 2 and 3 a little bit so their distributions would actually overlap with the comparison vector
vList <- list(v1,v2,v3)
brkvec <- seq(0,0.7,by=0.1)
hList <- lapply(vList,function(x)
hist(x,plot=FALSE, breaks=brkvec)$counts )
This is a little bit inefficient because it computes all of the pairwise distances and then throws most of them away ...
dmat <- dist(do.call(rbind,hList))
dvec <- as.matrix(dmat)[-1,1]
## 2 3
## 7.874008 6.000000
The other option would be to ignore the warning from ks.test()
(since it only affects inference, not the computation of the distance statistic)
ks.dist <- sapply(vList[-1],
function(x) suppressWarnings(ks.test(v1,x)$statistic))
ks.dist
## D D
## 0.6363636 0.4545455
The results match (i.e., v3 is closer to v1 than v2 is)
Upvotes: 2