JNab
JNab

Reputation: 135

How to correlate vectors based on the position of the elements?

I have two (and eventually more) character vectors that consist of several (ordered) names. Two examples of such vectors are:

 [1] "original" "gai"      "dea"      "iap"      "hso"      "los"      "ret"      "dap"      "wor"      "agi"     
[11] "fat"      "con"      "dep"      "iso"      "int" 

and

 [1] "int"      "iso"      "dep"      "con"      "fat"      "agi"      "wor"      "dap"      "ret"      "los"     
[11] "hso"      "iap"      "dea"      "gai"      "original"

(which can be reacreated using):

c("original", "gai", "dea", "iap", "hso", "los", "ret", "dap", 
"wor", "agi", "fat", "con", "dep", "iso", "int")

and

c("int", "iso", "dep", "con", "fat", "agi", "wor", "dap", "ret", 
"los", "hso", "iap", "dea", "gai", "original")

Now I would like to compute the correlation between these two character vectors, based on the positions of the elements. For instance, in the first vector, the element "original" has position 1, but in the second it has position 14.

How would I go about this?

Thanks in advance!

Upvotes: 2

Views: 251

Answers (3)

deepseefan
deepseefan

Reputation: 3791

For ordinal association, Kendall rank correlation coefficient (\tau) can be used. The catch here is to represent the strings as numeric so the result makes sense. That is for you to have a little taught and decide. Here is one way to go about it using a basic sequential numbers:

a <- c("original", "gai", "dea", "iap", "hso", "los", "ret", "dap", 
       "wor", "agi", "fat", "con", "dep", "iso", "int")

# numeric representation
a_num <- seq(a)

b <- c("int", "iso", "dep", "con", "fat", "agi", "wor", "dap", "ret", 
       "los", "hso", "iap", "dea", "gai", "original")

# numeric representation 
b_num <- match(a,b)
# -------------------------------------------------------------------------
# eye ball the relationship
plot(a_num, b_num, type = "l", col="red", xlab = "a", ylab = "b")

# -------------------------------------------------------------------------

Output

kendall_out

You can see a negative correlation and further you can compute the Kendall's rank correlation tau to reject/accept the null as follows:

# use method kendall
cor.test(a_num,b_num, method="kendall") 
# Kendall's rank correlation tau
# 
# data:  a_num and b_num
# T = 0, p-value = 1.529e-12
# alternative hypothesis: true tau is not equal to 0
# sample estimates:
# tau 
#  -1 

see ?cor.test for more. This is just to get you started and if you have ties in your data, kendall handles that too but you need to know/read how to do that.

Upvotes: 2

Jeroen Colin
Jeroen Colin

Reputation: 345

I think you could simply use match for that.

v1 <- c("original", "gai", "dea", "iap", "hso", "los", "ret", "dap", 
        "wor", "agi", "fat", "con", "dep", "iso", "int")
v2 <- c("int", "iso", "dep", "con", "fat", "agi", "wor", "dap", "ret", 
        "los", "hso", "iap", "dea", "gai", "original")

data.frame(v1 = v1, pos1 = 1:length(v1), pos2 =  match(v1, v2)) %>% 
  summarise(corr = cor(pos1, pos2))

Not sure what this correlation would mean though..

Upvotes: 0

ThomasIsCoding
ThomasIsCoding

Reputation: 102309

I am not sure if this is what you meant:

> r <- data.frame(pos = seq_along(x1), posinX2 = match(x1,x2))
> r
   pos posinX2
1    1      15
2    2      14
3    3      13
4    4      12
5    5      11
6    6      10
7    7       9
8    8       8
9    9       7
10  10       6
11  11       5
12  12       4
13  13       3
14  14       2
15  15       1

DATA

x1 <- c("original", "gai", "dea", "iap", "hso", "los", "ret", "dap", 
        "wor", "agi", "fat", "con", "dep", "iso", "int")
x2 <- c("int", "iso", "dep", "con", "fat", "agi", "wor", "dap", "ret", 
        "los", "hso", "iap", "dea", "gai", "original")

Upvotes: 0

Related Questions