Joe
Joe

Reputation: 121

R Create Symmetrical Matrix of Counts and Duplicates

I have data that consists of student enrollments by class and I'd like to compute a matrix that contains total enrollment on the diagonal and the number of students that are dual-enrolled in each class. Each line consists of the student's name and the class he or she is enrolled in. Here's some fake data (with famous Jazz musicians):

Class   Name
A   Jones
A   Smith
A   Johnson
A   Pastorius
B   Jones
B   Davis
B   Coltrane
B   Hancock
C   Smith
C   Shorter
C   Zawinul
C   Pastorius
C   Erskine

Jones is dual-enrolled in A and B, whereas Smith and Pastorius are both in A and C. B and C has no dual-enrollments. The output matrix should look like:

    A   B   C
A   4   1   2
B   1   4   0
C   2   0   5

Ideally, the code would work for any number of classes. I can do the counts in mysql and R by specifying each pair of classes in the code but can't figure out how to make it expandable to cover every class in the file. Thoughts and suggestions are greatly appreciated.

Upvotes: 0

Views: 114

Answers (2)

Joe
Joe

Reputation: 121

Akrun, you nailed it! And thanks for the welcome message, SZenC. Yes, I realize SO isn't a code farm. I tried several solutions and all were dead ends. Until, of course, I posted the question and then figured out an inelegant solution:

class1 <- table(class0$Name, class0$Class)
class.m <- as.data.frame.matrix(class1)
class.mm <- as.matrix(class.m)
check.m <- t(class.mm) %*% class.mm
check.m

It seems I did the crossproducts step-by-step. And that was helpful, too.

Joe

Upvotes: 1

akrun
akrun

Reputation: 887223

We can use table with crossprod

tcrossprod(table(df1))
#       Class
#Class A B C
#    A 4 1 2
#    B 1 4 0
#    C 2 0 5

data

df1 <- structure(list(Class = c("A", "A", "A", "A", "B", 
 "B", "B", "B", 
"C", "C", "C", "C", "C"), Name = c("Jones", "Smith", "Johnson", 
 "Pastorius", "Jones", "Davis", "Coltrane", "Hancock", "Smith", 
 "Shorter", "Zawinul", "Pastorius", "Erskine")), 
  .Names =  c("Class", 
 "Name"), class = "data.frame", row.names = c(NA, -13L))

Upvotes: 1

Related Questions