Nikolay Nenov
Nikolay Nenov

Reputation: 587

Count the number of instances where a variable or a combination of variables are TRUE

I'm an enthusiastic R newbie that needs some help! :)

I have a data frame that looks like this:

id<-c(100,200,300,400)
a<-c(1,1,0,1)
b<-c(1,0,1,0)
c<-c(0,0,1,1)

y=data.frame(id=id,a=a,b=b,c=c)

Where id is an unique identifier (e.g. a person) and a, b and c are dummy variables for whether the person has this feature or not (as always 1=TRUE).

I want R to create a matrix or data frame where I have the variables a, b and c both as the names of the columns and of the rows. For the values of the matrix R will have to calculate the number of identifiers that have this feature, or the combination of features.

So for example, IDs 100, 200 and 400 have feature a then in the diagonal of the matrix where a and a cross, R will input 3. Only ID 100 has both features a and b, hence R will input 1 where a and b cross, and so forth.

The resulting data frame will have to look like this:

l<-c("","a","b","c")
m<-c("a",3,1,1)
n<-c("b",1,2,1)
o<-c("c",1,1,2)
result<-matrix(c(l,m,n,o),nrow=4,ncol=4)

As my data set has 10 variables and hundreds of observations, I will have to automate the whole process.

Your help will be greatly appreciated. Thanks a lot!

Upvotes: 6

Views: 349

Answers (2)

Tyler Rinker
Tyler Rinker

Reputation: 109874

This is called an adjacency matrix. You can do this pretty easily with the qdap package:

library(qdap)
adjmat(y[,-1])$adjacency

##   a b c
## a 3 1 1
## b 1 2 1
## c 1 1 2

It throws a warning because you're feeding it a dataframe. Not a big deal and can be ignored. Also noticed I dropped the first column (ID's) with negative indexing y[, -1].

Note that because you started out with a Boolean matrix you could have gotten there with:

Y <- as.matrix(y[,-1])
t(Y) %*% Y

Upvotes: 3

Josh O&#39;Brien
Josh O&#39;Brien

Reputation: 162371

With base R:

crossprod(as.matrix(y[,-1]))
#   a b c
# a 3 1 1
# b 1 2 1
# c 1 1 2

Upvotes: 8

Related Questions