Reputation: 139
I have a data frame with frequencies in R like this:
V1 V2 V3 V4
row1 1 2 0 1
row2 0 6 0 3
row3 3 0 0 0
row4 0 0 2 0
row5 4 1 0 0
row6 3 0 1 1
(more rows)
a<-as.data.frame(matrix(c(1,2,0,1,0,6,0,3,3,0,0,0,0,0,2,0,4,1,0,0,3,0,1,1),byrow=T,ncol=4))
I want a function to calculate, for each row, matches between columns where both values are > 0, so I get a relationship matrix for V1-V4, like this:
V1 V2 V3 V4
V1
V2 2
V3 1 0
V4 2 2 1
Is there some handy function available? Or how should I do this?
Upvotes: 0
Views: 304
Reputation: 38500
Here is a base R method using combn
, sapply
, and rowSums
.
# get the pairwise combination of variables
varComb <- combn(names(df), 2)
varComb
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "V1" "V1" "V1" "V2" "V2" "V3"
[2,] "V2" "V3" "V4" "V3" "V4" "V4"
# get the counts
counts <- sapply(seq_len(ncol(varComb)),
function(i) sum(rowSums(df[,varComb[,i]] > 0) == 2))
Here, The variable combinations are used to subset the data frame, which is converted to a logical matrix based on whether or not the values are greater than 0. The rows are sumed together and are counted (using sum
) based on whether or not the result is equal to 2. sapply
allows us to apply this counting to every pair of variables present in varComb.
# put these into a data frame
setNames(data.frame(t(varComb), counts), c("var1", "var2", "counts"))
var1 var2 counts
1 V1 V2 2
2 V1 V3 1
3 V1 V4 2
4 V2 V3 0
5 V2 V4 2
6 V3 V4 1
Putting these result together, we can use setNames
, which allows us to create a data frame and apply names to the variables in one line.
to put this result into matrix, you could use cbind
and matrix subsetting:
# construct empty matrix
tempMat <- matrix(NA, 4, 4)
# fill it in
tempMat[cbind(as.integer(substr(dfNew$var2, 2, 2)),
as.integer(substr(dfNew$var1, 2, 2)))] <- dfNew$counts
tempMat
[,1] [,2] [,3] [,4]
[1,] NA NA NA NA
[2,] 2 NA NA NA
[3,] 1 0 NA NA
[4,] 2 2 1 NA
The as.integer
and substr
extract the rows and columns in which to place the values, cbind
converts this output into a matrix which is used for matrix subetting.
Upvotes: 1
Reputation: 139
Okay after a bit of fiddling around here's what I came up with:
a<-as.data.frame(matrix(c(1,2,0,1,0,6,0,3,3,0,0,0,0,0,2,0,4,1,0,0,3,0,1,1),byrow=T,ncol=4))
a[a>0]<-1
a<-t(a)
mat<-outer(1:nrow(a), 1:nrow(a), FUN=Vectorize(function(x,y) sum(a[x,]!=0 & a[y,]!=0)))
mat[upper.tri(mat,diag=T)] <- 0
Not pretty, but it seems to work.
Upvotes: 0