Omry Atia
Omry Atia

Reputation: 2443

confusion between categories in dplyr

I have the following data frame, describing conditions each patient has (each can have more than 1):

df <- structure(list(patient = c(1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6, 6, 
6, 7, 7, 8, 8, 9, 9, 10), condition = c("A", "A", "B", "B", "D", 
"C", "A", "C", "C", "B", "D", "B", "A", "A", "C", "B", "C", "D", 
"C", "D")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", 
"data.frame"))

I would like to create a "confusion matrix", which in this case will be a 4x4 matrix where AxA will have the value 5 (5 patients have condition A), AxB will have the value 2 (two patients have A and B), and so on.

How can I achieve this?

Upvotes: 0

Views: 50

Answers (2)

Pete Kittinun
Pete Kittinun

Reputation: 603

You can join the table itself and produce new calculation.

library(dplyr)

df2 <- df
df2 <- inner_join(df,df, by = "patient")
table(df2$condition.x,df2$condition.y)

    A B C D
  A 5 2 2 1
  B 2 5 3 2
  C 2 3 6 2
  D 1 2 2 4

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

Here is a base R answer using outer -

count_patient <- function(x, y) {
  length(intersect(df$patient[df$condition == x],
                   df$patient[df$condition == y])) 
}
vec <- sort(unique(df$condition))
res <- outer(vec, vec, Vectorize(count_patient))
dimnames(res) <- list(vec, vec)
res

#  A B C D
#A 5 2 2 1
#B 2 5 3 2
#C 2 3 6 2
#D 1 2 2 4

Upvotes: 1

Related Questions