Reputation: 587
I'm an enthusiastic R newbie that needs some help! :)
I have a data frame that looks like this:
id<-c(100,200,300,400)
a<-c(1,1,0,1)
b<-c(1,0,1,0)
c<-c(0,0,1,1)
y=data.frame(id=id,a=a,b=b,c=c)
Where id is an unique identifier (e.g. a person) and a, b and c are dummy variables for whether the person has this feature or not (as always 1=TRUE).
I want R to create a matrix or data frame where I have the variables a, b and c both as the names of the columns and of the rows. For the values of the matrix R will have to calculate the number of identifiers that have this feature, or the combination of features.
So for example, IDs 100, 200 and 400 have feature a then in the diagonal of the matrix where a and a cross, R will input 3. Only ID 100 has both features a and b, hence R will input 1 where a and b cross, and so forth.
The resulting data frame will have to look like this:
l<-c("","a","b","c")
m<-c("a",3,1,1)
n<-c("b",1,2,1)
o<-c("c",1,1,2)
result<-matrix(c(l,m,n,o),nrow=4,ncol=4)
As my data set has 10 variables and hundreds of observations, I will have to automate the whole process.
Your help will be greatly appreciated. Thanks a lot!
Upvotes: 6
Views: 349
Reputation: 109874
This is called an adjacency matrix. You can do this pretty easily with the qdap package:
library(qdap)
adjmat(y[,-1])$adjacency
## a b c
## a 3 1 1
## b 1 2 1
## c 1 1 2
It throws a warning because you're feeding it a dataframe. Not a big deal and can be ignored. Also noticed I dropped the first column (ID's) with negative indexing y[, -1]
.
Note that because you started out with a Boolean matrix you could have gotten there with:
Y <- as.matrix(y[,-1])
t(Y) %*% Y
Upvotes: 3
Reputation: 162371
With base R:
crossprod(as.matrix(y[,-1]))
# a b c
# a 3 1 1
# b 1 2 1
# c 1 1 2
Upvotes: 8