Jay Bee
Jay Bee

Reputation: 582

How do I create a contingency table of overlapping categories?

I have this dataframe:

structure(list(`Product Name` = c("Apple", "Banana", "Cherry", "Date", "Elderberry", "Fig", "Guava", "Honeydew", "Kiwi", "Lemon"), Benefits_Claim = c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE), Warnings_Claim = c(FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE), Instructions_Claim = c(FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE), Features_Claim = c(TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

I want to create a contingency table that shows counts of overlapping values ... I'm not sure if that's the best way to put it, but ideally it should look something like this:

(blank) Benefits Warnings Instructions Features
Benefits
Warnings
Instructions
Features

And the cell overlap of warnings and benefits should show the count of fruits that have both a Benefit and Warning value as TRUE, and so on.

I don't have much code to show for this as I don't really understand how to approach -- any advice gratefully received.

Upvotes: 0

Views: 109

Answers (2)

Iroha
Iroha

Reputation: 34751

You want the matrix crossproduct of your table:

crossprod(as.matrix(df[-1]))

                   Benefits_Claim Warnings_Claim Instructions_Claim Features_Claim
Benefits_Claim                  3              2                  2              3
Warnings_Claim                  2              9                  8              3
Instructions_Claim              2              8                  8              3
Features_Claim                  3              3                  3              4

Cleaned up and diagonal set to zero:

df|>
  setNames(sub("_Claim$", "", names(df))) |>
  subset(select = -`Product Name`) |>
  as.matrix() |>
  crossprod() |> 
  `diag<-`(0)

             Benefits Warnings Instructions Features
Benefits            0        2            2        3
Warnings            2        0            8        3
Instructions        2        8            0        3
Features            3        3            3        0

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389315

Here's one way to do this using base R -

sapply(df[-1], \(x) sapply(df[-1], \(y) sum(x & y)))

#                   Benefits_Claim Warnings_Claim Instructions_Claim Features_Claim
#Benefits_Claim                  3              2                  2              3
#Warnings_Claim                  2              9                  8              3
#Instructions_Claim              2              8                  8              3
#Features_Claim                  3              3                  3              4

Upvotes: 1

Related Questions