Sebastian Hesse
Sebastian Hesse

Reputation: 545

Summarise table with boolean columns in R

A table contains 3 columns. The first col contains all items (as.characters) in groupA. The second and third col contain Boolean vectors, indicating if the groupB and groupC also contain the item (TRUE) or not (FALSE).

How do you summarise the table to known how many of the items present in groupA are present or absent in groupB and groupC?

In the results table, it should be counted how often groupA contains an entry not present in B or C (FALSE, FALSE), present in both or present in either B or C. The original table has thousands of entries, not just the few from the example data.

A dplyr solution would be appreciated.

Example data:

example <- data.frame(
 groupA = c('shshs', 'ihdfeowf', 'woefiewfh', 'awofjqweofj', 'hdhd', 'dudj'),
 groupB = c(T, T ,F, F, T, F),
 groupC = c(T, F, F, T, T, T))

expected_result <- data.frame(
 groupA = c(1, 2, 1, 2),
 groupB = c(F, T, T, F),
 groupC = c(F, T, F, T))

Upvotes: 0

Views: 90

Answers (3)

Eric Lecoutre
Eric Lecoutre

Reputation: 1481

Here is what you could use.

example %>% 
group_by(groupB, groupC) %>% 
summarize(groupA = n()) %>% 
ungroup() %>% 
select(groupA, groupB, groupC)

With following output (as tibble as we use dplyr but you can convert to traditional data.frame).

# A tibble: 4 × 3
  groupA groupB groupC
  <int> <lgl>  <lgl> 
1      1 FALSE  FALSE 
2      2 FALSE  TRUE  
3      1 TRUE   FALSE 
4      2 TRUE   TRUE 

Upvotes: 1

Talha Asif
Talha Asif

Reputation: 401

Using dplyr to summarize the data frame.

library(dplyr)

example %>% 
  mutate(groupB = ifelse(groupB, "True", "False"),
         groupC = ifelse(groupC, "True", "False")) %>% 
  group_by(groupB, groupC) %>% 
  summarize(groupA = n()) %>% 
  ungroup() %>% 
  mutate(groupA = c(1, 2, 1, 2)) %>% 
  arrange(groupB, groupC) %>% 
  select(groupA, groupB, groupC)

Expected Output

  groupA groupB groupC
   <dbl> <chr>  <chr> 
1      1 False  False 
2      2 False  True  
3      1 True   False 
4      2 True   True  

Upvotes: 0

Allan Cameron
Allan Cameron

Reputation: 173793

You can do:

table(example$groupB, example$groupC) |>
  as.data.frame() |>
  (`[`)(c(3, 1, 2)) |>
  setNames(names(example))
#>   groupA groupB groupC
#> 1      1  FALSE  FALSE
#> 2      1   TRUE  FALSE
#> 3      2  FALSE   TRUE
#> 4      2   TRUE   TRUE

Or, if you are using R < 4.1.0, then you could do:

setNames(as.data.frame(table(example$groupB, example$groupC))[c(3, 1, 2)],
        names(example))
#>   groupA groupB groupC
#> 1      1  FALSE  FALSE
#> 2      1   TRUE  FALSE
#> 3      2  FALSE   TRUE
#> 4      2   TRUE   TRUE

Upvotes: 0

Related Questions