Reputation: 325
I'm relatively new to R, so forgive me for what I believe to be a relatively simple question.
I have data in the form
1 2 3 4 5
A 0 1 1 0 0
B 1 0 1 0 1
C 0 1 0 1 0
D 1 0 0 0 0
E 0 0 0 0 1
where A-E are people and 1-5 are binaries of whether or not they have that quality. I need to make a matrix of A-E where cell A,B = 1 if the sum of any quality 1-5 for A & B sums to 2. (If they share at least one quality). The simple 5x5 would be:
A B C D E
A 1
B 1 1
C 1 0 1
D 0 1 0 1
E 0 1 0 0 1
I then need to sum the entire matrix. (Above would be 9). I have thousands of observations, so I can't do this by hand. I'm sure there is an easy few lines of code, I'm just not experienced enough.
Thanks!
EDIT: I've imported the data from a .csv file with the columns (1-5 above) as variables, in the real data I have 40 variables. A-E are unique ID observations of people, approximately 2000. I would also like to know how to first convert this into a matrix, in order to execute the great answers you have already provided. Thanks!
Upvotes: 9
Views: 146
Reputation: 9618
What about this? (of cource not as elegant as the tcrossprod
solution):
d <- dim(m)
ind <- expand.grid(1:d[1],1:d[1])
M <- matrix(as.numeric(apply(cbind(m[ind[,2],],m[ind[,1]]), 1,
+ function(x) sum(x[1:d[1]] == 1 & x[(d[1]+1):(d[1]*2)] == 1) >=1)), ncol = d[1])
rownames(M) = colnames(M) = rownames(m)
M
A B C D E
A 1 1 1 0 0
B 1 1 0 1 1
C 1 0 1 0 0
D 0 1 0 1 0
E 0 1 0 0 1
Upvotes: 1
Reputation: 25844
You can use matrix multiplication here
out <- tcrossprod(m)
# A B C D E
# A 2 1 1 0 0
# B 1 3 0 1 1
# C 1 0 2 0 0
# D 0 1 0 1 0
# E 0 1 0 0 1
Then set the diagonal to one, if required
diag(out) <- 1
As DavidA points out in comments tcrossprod
is a basically doing m %*% t(m)
Several ways to them calculate the sum
l here is one
sum(out[upper.tri(out, diag=TRUE)] , na.rm=TRUE)
Upvotes: 7
Reputation: 31161
You can use outer
, if m
is your square matrix:
f = Vectorize(function(u,v) any(colSums(m[c(u,v),])>1)+0L)
res = outer(1:ncol(m), 1:ncol(m), FUN=f)
colnames(res) = row.names(res) = rownames(m)
# A B C D E
#A 1 1 1 0 0
#B 1 1 0 1 1
#C 1 0 1 0 0
#D 0 1 0 1 0
#E 0 1 0 0 1
Data:
m = structure(c(0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0,
1, 0, 0, 0, 1, 0, 0, 1), .Dim = c(5L, 5L), .Dimnames = list(c("A",
"B", "C", "D", "E"), NULL))
Upvotes: 1