Zwentibold
Zwentibold

Reputation: 139

R: Symmetrical relationship matrix from frequency data frame

I have a data frame with frequencies in R like this:

    V1 V2 V3 V4
row1 1  2  0  1
row2 0  6  0  3
row3 3  0  0  0
row4 0  0  2  0
row5 4  1  0  0
row6 3  0  1  1
(more rows)

a<-as.data.frame(matrix(c(1,2,0,1,0,6,0,3,3,0,0,0,0,0,2,0,4,1,0,0,3,0,1,1),byrow=T,ncol=4))

I want a function to calculate, for each row, matches between columns where both values are > 0, so I get a relationship matrix for V1-V4, like this:

    V1 V2 V3 V4
V1
V2   2
V3   1  0
V4   2  2  1

Is there some handy function available? Or how should I do this?

Upvotes: 0

Views: 304

Answers (2)

lmo
lmo

Reputation: 38500

Here is a base R method using combn, sapply, and rowSums.

# get the pairwise combination of variables
varComb <- combn(names(df), 2)
varComb
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] "V1" "V1" "V1" "V2" "V2" "V3"
[2,] "V2" "V3" "V4" "V3" "V4" "V4"

# get the counts
counts <- sapply(seq_len(ncol(varComb)),
                 function(i) sum(rowSums(df[,varComb[,i]] > 0) == 2))

Here, The variable combinations are used to subset the data frame, which is converted to a logical matrix based on whether or not the values are greater than 0. The rows are sumed together and are counted (using sum) based on whether or not the result is equal to 2. sapply allows us to apply this counting to every pair of variables present in varComb.

# put these into a data frame
setNames(data.frame(t(varComb), counts), c("var1", "var2", "counts"))
  var1 var2 counts
1   V1   V2      2
2   V1   V3      1
3   V1   V4      2
4   V2   V3      0
5   V2   V4      2
6   V3   V4      1

Putting these result together, we can use setNames, which allows us to create a data frame and apply names to the variables in one line.


to put this result into matrix, you could use cbind and matrix subsetting:

# construct empty matrix
tempMat <- matrix(NA, 4, 4)

# fill it in
tempMat[cbind(as.integer(substr(dfNew$var2, 2, 2)),
              as.integer(substr(dfNew$var1, 2, 2)))] <- dfNew$counts

tempMat
     [,1] [,2] [,3] [,4]
[1,]   NA   NA   NA   NA
[2,]    2   NA   NA   NA
[3,]    1    0   NA   NA
[4,]    2    2    1   NA

The as.integer and substr extract the rows and columns in which to place the values, cbind converts this output into a matrix which is used for matrix subetting.

Upvotes: 1

Zwentibold
Zwentibold

Reputation: 139

Okay after a bit of fiddling around here's what I came up with:

a<-as.data.frame(matrix(c(1,2,0,1,0,6,0,3,3,0,0,0,0,0,2,0,4,1,0,0,3,0,1,1),byrow=T,ncol=4))
a[a>0]<-1
a<-t(a)    
mat<-outer(1:nrow(a), 1:nrow(a), FUN=Vectorize(function(x,y) sum(a[x,]!=0 & a[y,]!=0)))
mat[upper.tri(mat,diag=T)] <- 0

Not pretty, but it seems to work.

Upvotes: 0

Related Questions