Reputation:
I want to write a code for applying the fuction calculating the Spearman's rank correlation between combinations of column from a dataset. I have the following dataset:
library(openxlsx)
data <-read.xlsx("e:/LINGUISTICS/mydata.xlsx", 1);
A B C D
go see get eat
see get eat go
get go go get
eat eat see see
The function cor(rank(x), rank(y), method = "spearman") measures correlation only between two columns, e.g. between A and B:
cor(rank(data$A), rank(data$B), method = "spearman")
But I need to calculate correlation between all possible combinations of columns (AB, AC, AD, BC, BD, CD). I wrote the following function for that:
wert <- function(x, y) { cor(rank(x), rank(y), method = "spearman") }
I do not know how to implement all possible combinations of columns (AB, AC, AD, BC, BD, CD) in my function in order to get all results automatically, because my real data has much more columns, and also as a matrix with correlation scores, e.g. as the following table:
A B C D
A 1 0.3 0.4 0.8
B 0.3 1 0.6 0.5
C 0.4 0.6 1 0.1
D 0.8 0.5 0.1 1
Can somebody help me?
Upvotes: 1
Views: 3014
Reputation: 1
I think you can just make a function (pairedcolumns) that will then apply your function (spearman) to every pair of columns in the data frame you feed it.
#This function works on a data frame (x) usingwhichever other function (fun) you select by making all pairs of columns possible.
pairedcolumns <- function(x,fun)
{
n <- ncol(x)##find out how many columns are in the data frame
foo <- matrix(0,n,n)
for ( i in 1:n)
{
for (j in 1:n)
{
foo[i,j] <- fun(x[,i],x[,j])
}
}
colnames(foo)<-rownames(foo)<-colnames(x)
return(foo)
}
results<-pairedcolumns(yourdataframe[,2:8], function)
Upvotes: 0
Reputation: 132746
You do not need rank
. cor
already calculates the Spearman rank correlation with method = "spearman"
. If you want the correlation between all columns of a data.frame, just pass the data.frame to cor
, i.e. cor(data, method = "spearman")
. You should study help("cor")
.
If you want to do this manually, use the combn
function.
PS: Your additional challenge is that you actually have factor variables. A rank for an unordered factor is a strange concept, but R just uses collation order here. Since cor
rightly expects numeric input, you should do data[] <- lapply(data, as.integer)
first.
Upvotes: 0