R: Symmetrical relationship matrix from frequency data frame

Question

I have a data frame with frequencies in R like this:

    V1 V2 V3 V4
row1 1  2  0  1
row2 0  6  0  3
row3 3  0  0  0
row4 0  0  2  0
row5 4  1  0  0
row6 3  0  1  1
(more rows)

a<-as.data.frame(matrix(c(1,2,0,1,0,6,0,3,3,0,0,0,0,0,2,0,4,1,0,0,3,0,1,1),byrow=T,ncol=4))

I want a function to calculate, for each row, matches between columns where both values are > 0, so I get a relationship matrix for V1-V4, like this:

    V1 V2 V3 V4
V1
V2   2
V3   1  0
V4   2  2  1

Is there some handy function available? Or how should I do this?

lmo · Accepted Answer

Here is a base R method using combn, sapply, and rowSums.

# get the pairwise combination of variables
varComb <- combn(names(df), 2)
varComb
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] "V1" "V1" "V1" "V2" "V2" "V3"
[2,] "V2" "V3" "V4" "V3" "V4" "V4"

# get the counts
counts <- sapply(seq_len(ncol(varComb)),
                 function(i) sum(rowSums(df[,varComb[,i]] > 0) == 2))

Here, The variable combinations are used to subset the data frame, which is converted to a logical matrix based on whether or not the values are greater than 0. The rows are sumed together and are counted (using sum) based on whether or not the result is equal to 2. sapply allows us to apply this counting to every pair of variables present in varComb.

# put these into a data frame
setNames(data.frame(t(varComb), counts), c("var1", "var2", "counts"))
  var1 var2 counts
1   V1   V2      2
2   V1   V3      1
3   V1   V4      2
4   V2   V3      0
5   V2   V4      2
6   V3   V4      1

Putting these result together, we can use setNames, which allows us to create a data frame and apply names to the variables in one line.

to put this result into matrix, you could use cbind and matrix subsetting:

# construct empty matrix
tempMat <- matrix(NA, 4, 4)

# fill it in
tempMat[cbind(as.integer(substr(dfNew$var2, 2, 2)),
              as.integer(substr(dfNew$var1, 2, 2)))] <- dfNew$counts

tempMat
     [,1] [,2] [,3] [,4]
[1,]   NA   NA   NA   NA
[2,]    2   NA   NA   NA
[3,]    1    0   NA   NA
[4,]    2    2    1   NA

The as.integer and substr extract the rows and columns in which to place the values, cbind converts this output into a matrix which is used for matrix subetting.

R: Symmetrical relationship matrix from frequency data frame

Answers (2)

Related Questions