ChrisDH
ChrisDH

Reputation: 325

Generating and Summing Matrix

I'm relatively new to R, so forgive me for what I believe to be a relatively simple question.

I have data in the form

    1   2   3   4   5
A   0   1   1   0   0
B   1   0   1   0   1
C   0   1   0   1   0
D   1   0   0   0   0
E   0   0   0   0   1

where A-E are people and 1-5 are binaries of whether or not they have that quality. I need to make a matrix of A-E where cell A,B = 1 if the sum of any quality 1-5 for A & B sums to 2. (If they share at least one quality). The simple 5x5 would be:

    A   B   C   D   E
A   1               
B   1   1           
C   1   0   1       
D   0   1   0   1   
E   0   1   0   0   1

I then need to sum the entire matrix. (Above would be 9). I have thousands of observations, so I can't do this by hand. I'm sure there is an easy few lines of code, I'm just not experienced enough.

Thanks!

EDIT: I've imported the data from a .csv file with the columns (1-5 above) as variables, in the real data I have 40 variables. A-E are unique ID observations of people, approximately 2000. I would also like to know how to first convert this into a matrix, in order to execute the great answers you have already provided. Thanks!

Upvotes: 9

Views: 146

Answers (3)

DatamineR
DatamineR

Reputation: 9618

What about this? (of cource not as elegant as the tcrossprod solution):

d <- dim(m)
ind <- expand.grid(1:d[1],1:d[1])
M <- matrix(as.numeric(apply(cbind(m[ind[,2],],m[ind[,1]]), 1, 
+   function(x) sum(x[1:d[1]] == 1 & x[(d[1]+1):(d[1]*2)] == 1) >=1)), ncol = d[1])

rownames(M) = colnames(M) = rownames(m)
M
  A B C D E
A 1 1 1 0 0
B 1 1 0 1 1
C 1 0 1 0 0
D 0 1 0 1 0
E 0 1 0 0 1

Upvotes: 1

user20650
user20650

Reputation: 25844

You can use matrix multiplication here

out <- tcrossprod(m)
#   A B C D E
# A 2 1 1 0 0
# B 1 3 0 1 1
# C 1 0 2 0 0
# D 0 1 0 1 0
# E 0 1 0 0 1

Then set the diagonal to one, if required

diag(out) <- 1

As DavidA points out in comments tcrossprod is a basically doing m %*% t(m)

Several ways to them calculate the suml here is one

sum(out[upper.tri(out, diag=TRUE)] , na.rm=TRUE)

Upvotes: 7

Colonel Beauvel
Colonel Beauvel

Reputation: 31161

You can use outer, if m is your square matrix:

f = Vectorize(function(u,v) any(colSums(m[c(u,v),])>1)+0L)

res = outer(1:ncol(m), 1:ncol(m), FUN=f)
colnames(res) = row.names(res) = rownames(m)

#  A B C D E
#A 1 1 1 0 0
#B 1 1 0 1 1
#C 1 0 1 0 0
#D 0 1 0 1 0
#E 0 1 0 0 1

Data:

m = structure(c(0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 
1, 0, 0, 0, 1, 0, 0, 1), .Dim = c(5L, 5L), .Dimnames = list(c("A", 
"B", "C", "D", "E"), NULL))

Upvotes: 1

Related Questions