Reputation: 608
Data | FactorA | FactorB | FactorC |
---|---|---|---|
D1 | Yes | Yes | No |
D2 | No | No | Yes |
D1 | Weak No | No | No |
D2 | No | Yes | No |
D1 | Weak Yes | No | No |
D2 | No | No | No |
D1 | No | No | Yes |
D2 | Weak Yes | No | No |
D1 | Weak No | No | Yes |
D2 | No | No | No |
And would like to get a table like this:
FactorA | FactorB | FactorC |
---|---|---|
No | 1 | 2 |
Weak No | 0 | 1 |
Weak Yes | 0 | 0 |
Yes | 1 | 0 |
which counts the pairwise co-occurances of each level of FactorA
with "Yes" of FactorB
and FactorC
. Preferably once, overall and grouped by Data
.
df %>%
group_by(Dataset) %>%
group_by(FactorA, FactorB) %>%
summarise(num = n()) %>%
spread(FactorB, num)
which return
# A tibble: 4 x 3
# Groups: FactorA [4]
FactorA No Yes
<fct> <int> <int>
1 No 1092 36
2 Weakly No 684 NA
3 Weakly Yes 2388 60
4 Yes 9660 216
(numbers in output are taken from the real data, not toy data)
Is there a sleek way to get the table I want with multiple Factors in dplyr
style s.t. I can simply split it by Data
later.
Upvotes: 0
Views: 2184
Reputation: 269526
1) Base R Compare each of the columns except the first two to Yes and grouping that by the second column sum them. The result is the following one-liner. No packages are used. It ran nearly 3x faster than a dplyr solution when I benchmarked it.
# column 1 is Data, column 2 is FactorA, rest are other factors
aggregate(DF[-(1:2)] == "Yes", DF[2], sum)
giving:
FactorA FactorB FactorC
1 No 1 2
2 Weak No 0 1
3 Weak Yes 0 0
4 Yes 1 0
2) collapse A similar approach which likely runs faster is to use collap in the collapse package. It ran 19x faster than a dplyr solution when I benchmarked it.
library(collapse)
collap(+(slt(DF, FactorB:FactorC) == "Yes"), DF$FactorA, fsum)
giving:
FactorA FactorB FactorC
1 No 1 2
2 Weak No 0 1
3 Weak Yes 0 0
4 Yes 1 0
The input in reproducible form:
DF <- structure(list(Data = c("D1", "D2", "D1", "D2", "D1", "D2", "D1",
"D2", "D1", "D2"), FactorA = c("Yes", "No", "Weak No", "No",
"Weak Yes", "No", "No", "Weak Yes", "Weak No", "No"), FactorB = c("Yes",
"No", "No", "Yes", "No", "No", "No", "No", "No", "No"), FactorC = c("No",
"Yes", "No", "No", "No", "No", "Yes", "No", "Yes", "No")), class = "data.frame", row.names = c(NA, -10L))
Upvotes: 0
Reputation: 8880
base
df <- structure(list(Data = c("D1", "D2", "D1", "D2", "D1", "D2", "D1",
"D2", "D1", "D2"), FactorA = c("Yes", "No", "Weak No", "No",
"Weak Yes", "No", "No", "Weak Yes", "Weak No", "No"), FactorB = c("Yes",
"No", "No", "Yes", "No", "No", "No", "No", "No", "No"), FactorC = c("No",
"Yes", "No", "No", "No", "No", "Yes", "No", "Yes", "No")), class = "data.frame", row.names = c(NA,
-10L))
sapply(df[3:4], function(x) table(df$FactorA, x)[, 2])
#> FactorB FactorC
#> No 1 2
#> Weak No 0 1
#> Weak Yes 0 0
#> Yes 1 0
Created on 2021-03-16 by the reprex package (v1.0.0)
Upvotes: 0
Reputation: 388962
group_by
FactorA
and count number of Yes
in FactorB
and FactorC
columns.
library(dplyr)
df %>%
group_by(FactorA) %>%
summarise(across(FactorB:FactorC, ~sum(. == 'Yes')))
# FactorA FactorB FactorC
#* <chr> <int> <int>
#1 No 1 2
#2 Weak No 0 1
#3 Weak Yes 0 0
#4 Yes 1 0
Upvotes: 2