maxheld
maxheld

Reputation: 4273

How can I calculate the correlation coefficients on the third dimension of an array?

Say, I have an array three dimensions, with items as rows, items as columns, participants as third dimension and values in co-occurence counts. Notice further that each of the array "slices" (= item x item matrices) is symmetrical (because they're co-occurence counts!).

Like so:

a <- structure(c(17L, 1L, 0L, 1L, 1L, 17L, 0L, 1L, 0L, 0L, 17L, 0L, 1L, 1L, 0L, 17L, 16L, 0L, 0L, 1L, 0L, 16L, 0L, 0L, 0L, 0L, 16L, 0L, 1L, 0L, 0L, 16L, 18L, 1L, 2L, 3L, 1L, 18L, 1L, 2L, 2L, 1L, 18L, 0L, 3L, 2L, 0L, 18L), .Dim = c(4L, 4L, 3L), .Dimnames = structure(list(items = c("but-how", "encyclopedia", "alien", "comma"), items = c("but-how", "encyclopedia", "alien", "comma"), people = c("Julius", "Tashina", "Azra")), .Names = c("items", "items", "people")))

I now want the correlation coefficients matrix of participants x participants, that is, the respective coefficients for Julius, Tashina and Azra. To do that, I'd just want to correlate their respective cells in the two matrices, so for Azra and Tashina, I'd correlate their respective upper (or lower) triangles.

It's not obvious to me how to do this, since cor() and friends don't accept arrays.

I can hack-do this via some apply() and upper.tri() action, like in the below, but I am guessing there has to be a more efficient, matrix-magical way to do this, right?


Here's the hacky way I'm doing this now. Don't laugh.

loosedat <- apply(X = a, MARGIN = c(3), FUN = function(x) {
    x <- x[upper.tri(x = x, diag = FALSE)]  # must kill diagonal, will otherwise inflate results
})  
cor(loosedat)

Gets me what I want, but I feel dirty doing it.

           Julius   Tashina     Azra
Julius  1.0000000 0.4472136 0.522233
Tashina 0.4472136 1.0000000 0.700649
Azra    0.5222330 0.7006490 1.000000

Upvotes: 0

Views: 107

Answers (1)

Zheyuan Li
Zheyuan Li

Reputation: 73275

How about

n <- dim(a)[3L]    ## number of people
m <- dim(a)[1L]    ## square table dimension
id <- dimnames(a)[[3L]]    ## name of people
uptri <- upper.tri(diag(m))    ## upper triangular index
loosedat <- matrix(as.numeric(a)[uptri], ncol = n, dimnames = list(NULL, id))
#     Julius Tashina Azra
#[1,]      1       0    1
#[2,]      0       0    2
#[3,]      0       0    1
#[4,]      1       1    3
#[5,]      1       0    2
#[6,]      0       0    0

cor(loosedat)
#           Julius   Tashina     Azra
#Julius  1.0000000 0.4472136 0.522233
#Tashina 0.4472136 1.0000000 0.700649
#Azra    0.5222330 0.7006490 1.000000

You can squeeze above code into a single line. But for readable demonstration I take the step-by-step approach.

Upvotes: 1

Related Questions