Reputation: 582
I have this dataframe:
structure(list(`Product Name` = c("Apple", "Banana", "Cherry", "Date", "Elderberry", "Fig", "Guava", "Honeydew", "Kiwi", "Lemon"), Benefits_Claim = c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE), Warnings_Claim = c(FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE), Instructions_Claim = c(FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE), Features_Claim = c(TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
I want to create a contingency table that shows counts of overlapping values ... I'm not sure if that's the best way to put it, but ideally it should look something like this:
(blank) | Benefits | Warnings | Instructions | Features |
---|---|---|---|---|
Benefits | ||||
Warnings | ||||
Instructions | ||||
Features |
And the cell overlap of warnings and benefits should show the count of fruits that have both a Benefit and Warning value as TRUE, and so on.
I don't have much code to show for this as I don't really understand how to approach -- any advice gratefully received.
Upvotes: 0
Views: 109
Reputation: 34751
You want the matrix crossproduct of your table:
crossprod(as.matrix(df[-1]))
Benefits_Claim Warnings_Claim Instructions_Claim Features_Claim
Benefits_Claim 3 2 2 3
Warnings_Claim 2 9 8 3
Instructions_Claim 2 8 8 3
Features_Claim 3 3 3 4
Cleaned up and diagonal set to zero:
df|>
setNames(sub("_Claim$", "", names(df))) |>
subset(select = -`Product Name`) |>
as.matrix() |>
crossprod() |>
`diag<-`(0)
Benefits Warnings Instructions Features
Benefits 0 2 2 3
Warnings 2 0 8 3
Instructions 2 8 0 3
Features 3 3 3 0
Upvotes: 1
Reputation: 389315
Here's one way to do this using base R -
sapply(df[-1], \(x) sapply(df[-1], \(y) sum(x & y)))
# Benefits_Claim Warnings_Claim Instructions_Claim Features_Claim
#Benefits_Claim 3 2 2 3
#Warnings_Claim 2 9 8 3
#Instructions_Claim 2 8 8 3
#Features_Claim 3 3 3 4
Upvotes: 1