Carol Eisen
Carol Eisen

Reputation: 608

Cross Table / Tabular in R with dplyr

My Data

Data FactorA FactorB FactorC
D1 Yes Yes No
D2 No No Yes
D1 Weak No No No
D2 No Yes No
D1 Weak Yes No No
D2 No No No
D1 No No Yes
D2 Weak Yes No No
D1 Weak No No Yes
D2 No No No

What I want

And would like to get a table like this:

FactorA FactorB FactorC
No 1 2
Weak No 0 1
Weak Yes 0 0
Yes 1 0

which counts the pairwise co-occurances of each level of FactorA with "Yes" of FactorB and FactorC. Preferably once, overall and grouped by Data.

What I have

df %>% 
    group_by(Dataset) %>%
    group_by(FactorA, FactorB) %>% 
    summarise(num = n()) %>%
    spread(FactorB, num)

which return

# A tibble: 4 x 3
# Groups:   FactorA [4]
  FactorA    No   Yes
  <fct>         <int> <int>
1 No             1092    36
2 Weakly No       684    NA
3 Weakly Yes     2388    60
4 Yes            9660   216

(numbers in output are taken from the real data, not toy data)

Question

Is there a sleek way to get the table I want with multiple Factors in dplyr style s.t. I can simply split it by Data later.

Upvotes: 0

Views: 2184

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269526

1) Base R Compare each of the columns except the first two to Yes and grouping that by the second column sum them. The result is the following one-liner. No packages are used. It ran nearly 3x faster than a dplyr solution when I benchmarked it.

# column 1 is Data, column 2 is FactorA, rest are other factors
aggregate(DF[-(1:2)] == "Yes", DF[2], sum)

giving:

   FactorA FactorB FactorC
1       No       1       2
2  Weak No       0       1
3 Weak Yes       0       0
4      Yes       1       0

2) collapse A similar approach which likely runs faster is to use collap in the collapse package. It ran 19x faster than a dplyr solution when I benchmarked it.

library(collapse)

collap(+(slt(DF, FactorB:FactorC) == "Yes"), DF$FactorA, fsum)

giving:

   FactorA FactorB FactorC
1       No       1       2
2  Weak No       0       1
3 Weak Yes       0       0
4      Yes       1       0

Note

The input in reproducible form:

DF <- structure(list(Data = c("D1", "D2", "D1", "D2", "D1", "D2", "D1", 
"D2", "D1", "D2"), FactorA = c("Yes", "No", "Weak No", "No", 
"Weak Yes", "No", "No", "Weak Yes", "Weak No", "No"), FactorB = c("Yes", 
"No", "No", "Yes", "No", "No", "No", "No", "No", "No"), FactorC = c("No", 
"Yes", "No", "No", "No", "No", "Yes", "No", "Yes", "No")), class = "data.frame", row.names = c(NA, -10L))

Upvotes: 0

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

base

df <- structure(list(Data = c("D1", "D2", "D1", "D2", "D1", "D2", "D1", 
                              "D2", "D1", "D2"), FactorA = c("Yes", "No", "Weak No", "No", 
                                                             "Weak Yes", "No", "No", "Weak Yes", "Weak No", "No"), FactorB = c("Yes", 
                                                                                                                               "No", "No", "Yes", "No", "No", "No", "No", "No", "No"), FactorC = c("No", 
                                                                                                                                                                                                   "Yes", "No", "No", "No", "No", "Yes", "No", "Yes", "No")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                                                  -10L))



sapply(df[3:4], function(x) table(df$FactorA, x)[, 2])
#>          FactorB FactorC
#> No             1       2
#> Weak No        0       1
#> Weak Yes       0       0
#> Yes            1       0

Created on 2021-03-16 by the reprex package (v1.0.0)

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388962

group_by FactorA and count number of Yes in FactorB and FactorC columns.

library(dplyr)

df %>%
  group_by(FactorA) %>%
  summarise(across(FactorB:FactorC, ~sum(. == 'Yes')))

#  FactorA  FactorB FactorC
#* <chr>      <int>   <int>
#1 No             1       2
#2 Weak No        0       1
#3 Weak Yes       0       0
#4 Yes            1       0

Upvotes: 2

Related Questions