Reputation: 608

Cross Table / Tabular in R with dplyr

My Data

Data	FactorA	FactorB	FactorC
D1	Yes	Yes	No
D2	No	No	Yes
D1	Weak No	No	No
D2	No	Yes	No
D1	Weak Yes	No	No
D2	No	No	No
D1	No	No	Yes
D2	Weak Yes	No	No
D1	Weak No	No	Yes
D2	No	No	No

What I want

And would like to get a table like this:

FactorA	FactorB	FactorC
No	1	2
Weak No	0	1
Weak Yes	0	0
Yes	1	0

which counts the pairwise co-occurances of each level of FactorA with "Yes" of FactorB and FactorC. Preferably once, overall and grouped by Data.

What I have

df %>% 
    group_by(Dataset) %>%
    group_by(FactorA, FactorB) %>% 
    summarise(num = n()) %>%
    spread(FactorB, num)

which return

# A tibble: 4 x 3
# Groups:   FactorA [4]
  FactorA    No   Yes
  <fct>         <int> <int>
1 No             1092    36
2 Weakly No       684    NA
3 Weakly Yes     2388    60
4 Yes            9660   216

(numbers in output are taken from the real data, not toy data)

Question

Is there a sleek way to get the table I want with multiple Factors in dplyr style s.t. I can simply split it by Data later.

Upvotes: 0

Answers (3)

G. Grothendieck

Reputation: 270248

1) Base R Compare each of the columns except the first two to Yes and grouping that by the second column sum them. The result is the following one-liner. No packages are used. It ran nearly 3x faster than a dplyr solution when I benchmarked it.

# column 1 is Data, column 2 is FactorA, rest are other factors
aggregate(DF[-(1:2)] == "Yes", DF[2], sum)

giving:

   FactorA FactorB FactorC
1       No       1       2
2  Weak No       0       1
3 Weak Yes       0       0
4      Yes       1       0

2) collapse A similar approach which likely runs faster is to use collap in the collapse package. It ran 19x faster than a dplyr solution when I benchmarked it.

library(collapse)

collap(+(slt(DF, FactorB:FactorC) == "Yes"), DF$FactorA, fsum)

giving:

   FactorA FactorB FactorC
1       No       1       2
2  Weak No       0       1
3 Weak Yes       0       0
4      Yes       1       0

Note

The input in reproducible form:

DF <- structure(list(Data = c("D1", "D2", "D1", "D2", "D1", "D2", "D1", 
"D2", "D1", "D2"), FactorA = c("Yes", "No", "Weak No", "No", 
"Weak Yes", "No", "No", "Weak Yes", "Weak No", "No"), FactorB = c("Yes", 
"No", "No", "Yes", "No", "No", "No", "No", "No", "No"), FactorC = c("No", 
"Yes", "No", "No", "No", "No", "Yes", "No", "Yes", "No")), class = "data.frame", row.names = c(NA, -10L))

Upvotes: 0

Yuriy Saraykin

Reputation: 8880

base

df <- structure(list(Data = c("D1", "D2", "D1", "D2", "D1", "D2", "D1", 
                              "D2", "D1", "D2"), FactorA = c("Yes", "No", "Weak No", "No", 
                                                             "Weak Yes", "No", "No", "Weak Yes", "Weak No", "No"), FactorB = c("Yes", 
                                                                                                                               "No", "No", "Yes", "No", "No", "No", "No", "No", "No"), FactorC = c("No", 
                                                                                                                                                                                                   "Yes", "No", "No", "No", "No", "Yes", "No", "Yes", "No")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                                                  -10L))



sapply(df[3:4], function(x) table(df$FactorA, x)[, 2])
#>          FactorB FactorC
#> No             1       2
#> Weak No        0       1
#> Weak Yes       0       0
#> Yes            1       0

^{Created on 2021-03-16 by the reprex package (v1.0.0)}

Upvotes: 0

Ronak Shah

Reputation: 389275

group_by FactorA and count number of Yes in FactorB and FactorC columns.

library(dplyr)

df %>%
  group_by(FactorA) %>%
  summarise(across(FactorB:FactorC, ~sum(. == 'Yes')))

#  FactorA  FactorB FactorC
#* <chr>      <int>   <int>
#1 No             1       2
#2 Weak No        0       1
#3 Weak Yes       0       0
#4 Yes            1       0

Upvotes: 2

Cross Table / Tabular in R with dplyr

My Data

What I want

What I have

Question

Answers (3)

Note

Related Questions