Reputation: 135
I have two (and eventually more) character vectors that consist of several (ordered) names. Two examples of such vectors are:
[1] "original" "gai" "dea" "iap" "hso" "los" "ret" "dap" "wor" "agi"
[11] "fat" "con" "dep" "iso" "int"
and
[1] "int" "iso" "dep" "con" "fat" "agi" "wor" "dap" "ret" "los"
[11] "hso" "iap" "dea" "gai" "original"
(which can be reacreated using):
c("original", "gai", "dea", "iap", "hso", "los", "ret", "dap",
"wor", "agi", "fat", "con", "dep", "iso", "int")
and
c("int", "iso", "dep", "con", "fat", "agi", "wor", "dap", "ret",
"los", "hso", "iap", "dea", "gai", "original")
Now I would like to compute the correlation between these two character vectors, based on the positions of the elements. For instance, in the first vector, the element "original" has position 1, but in the second it has position 14.
How would I go about this?
Thanks in advance!
Upvotes: 2
Views: 251
Reputation: 3791
For ordinal association, Kendall rank correlation coefficient
() can be used. The catch here is to represent the strings as numeric so the result makes sense. That is for you to have a little taught and decide. Here is one way to go about it using a basic sequential numbers:
a <- c("original", "gai", "dea", "iap", "hso", "los", "ret", "dap",
"wor", "agi", "fat", "con", "dep", "iso", "int")
# numeric representation
a_num <- seq(a)
b <- c("int", "iso", "dep", "con", "fat", "agi", "wor", "dap", "ret",
"los", "hso", "iap", "dea", "gai", "original")
# numeric representation
b_num <- match(a,b)
# -------------------------------------------------------------------------
# eye ball the relationship
plot(a_num, b_num, type = "l", col="red", xlab = "a", ylab = "b")
# -------------------------------------------------------------------------
You can see a negative correlation and further you can compute the Kendall's rank correlation tau
to reject/accept the null
as follows:
# use method kendall
cor.test(a_num,b_num, method="kendall")
# Kendall's rank correlation tau
#
# data: a_num and b_num
# T = 0, p-value = 1.529e-12
# alternative hypothesis: true tau is not equal to 0
# sample estimates:
# tau
# -1
see ?cor.test
for more. This is just to get you started and if you have ties in your data, kendall
handles that too but you need to know/read how to do that.
Upvotes: 2
Reputation: 345
I think you could simply use match for that.
v1 <- c("original", "gai", "dea", "iap", "hso", "los", "ret", "dap",
"wor", "agi", "fat", "con", "dep", "iso", "int")
v2 <- c("int", "iso", "dep", "con", "fat", "agi", "wor", "dap", "ret",
"los", "hso", "iap", "dea", "gai", "original")
data.frame(v1 = v1, pos1 = 1:length(v1), pos2 = match(v1, v2)) %>%
summarise(corr = cor(pos1, pos2))
Not sure what this correlation would mean though..
Upvotes: 0
Reputation: 102309
I am not sure if this is what you meant:
> r <- data.frame(pos = seq_along(x1), posinX2 = match(x1,x2))
> r
pos posinX2
1 1 15
2 2 14
3 3 13
4 4 12
5 5 11
6 6 10
7 7 9
8 8 8
9 9 7
10 10 6
11 11 5
12 12 4
13 13 3
14 14 2
15 15 1
DATA
x1 <- c("original", "gai", "dea", "iap", "hso", "los", "ret", "dap",
"wor", "agi", "fat", "con", "dep", "iso", "int")
x2 <- c("int", "iso", "dep", "con", "fat", "agi", "wor", "dap", "ret",
"los", "hso", "iap", "dea", "gai", "original")
Upvotes: 0