Reputation: 545
A table contains 3 columns. The first col contains all items (as.characters) in groupA. The second and third col contain Boolean vectors, indicating if the groupB and groupC also contain the item (TRUE) or not (FALSE).
How do you summarise the table to known how many of the items present in groupA are present or absent in groupB and groupC?
In the results table, it should be counted how often groupA contains an entry not present in B or C (FALSE, FALSE), present in both or present in either B or C. The original table has thousands of entries, not just the few from the example data.
A dplyr solution would be appreciated.
Example data:
example <- data.frame(
groupA = c('shshs', 'ihdfeowf', 'woefiewfh', 'awofjqweofj', 'hdhd', 'dudj'),
groupB = c(T, T ,F, F, T, F),
groupC = c(T, F, F, T, T, T))
expected_result <- data.frame(
groupA = c(1, 2, 1, 2),
groupB = c(F, T, T, F),
groupC = c(F, T, F, T))
Upvotes: 0
Views: 90
Reputation: 1481
Here is what you could use.
example %>%
group_by(groupB, groupC) %>%
summarize(groupA = n()) %>%
ungroup() %>%
select(groupA, groupB, groupC)
With following output (as tibble as we use dplyr but you can convert to traditional data.frame).
# A tibble: 4 × 3
groupA groupB groupC
<int> <lgl> <lgl>
1 1 FALSE FALSE
2 2 FALSE TRUE
3 1 TRUE FALSE
4 2 TRUE TRUE
Upvotes: 1
Reputation: 401
Using dplyr
to summarize the data frame.
library(dplyr)
example %>%
mutate(groupB = ifelse(groupB, "True", "False"),
groupC = ifelse(groupC, "True", "False")) %>%
group_by(groupB, groupC) %>%
summarize(groupA = n()) %>%
ungroup() %>%
mutate(groupA = c(1, 2, 1, 2)) %>%
arrange(groupB, groupC) %>%
select(groupA, groupB, groupC)
Expected Output
groupA groupB groupC
<dbl> <chr> <chr>
1 1 False False
2 2 False True
3 1 True False
4 2 True True
Upvotes: 0
Reputation: 173793
You can do:
table(example$groupB, example$groupC) |>
as.data.frame() |>
(`[`)(c(3, 1, 2)) |>
setNames(names(example))
#> groupA groupB groupC
#> 1 1 FALSE FALSE
#> 2 1 TRUE FALSE
#> 3 2 FALSE TRUE
#> 4 2 TRUE TRUE
Or, if you are using R < 4.1.0, then you could do:
setNames(as.data.frame(table(example$groupB, example$groupC))[c(3, 1, 2)],
names(example))
#> groupA groupB groupC
#> 1 1 FALSE FALSE
#> 2 1 TRUE FALSE
#> 3 2 FALSE TRUE
#> 4 2 TRUE TRUE
Upvotes: 0