Rohit
Rohit

Reputation: 71

Counting "Yes" in multiple columns in a data-frame using dplyr

Suppose I am having the following data. [As requested I am adding the data]

col1 <- c("Team A", "Team A", "Team A", "Team B", "Team B", "Team B", "Team C", "Team C", "Team C", "Team D", "Team D", "Team D")
col2 <- c("High",   "Medium", "Medium", "Low", "Low", "Low", "High", "Medium", "Low", "Medium", "Medium", "Medium")
col3 <- c("Yes", "Yes", "No", "No", "No", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes")
col4 <- c("No", "Yes", "No", "Yes", "Yes", "No", "No", "Yes", "No", "Yes", "No", "Yes")
df <- data.frame(col1, col2, col3, col4)
# Col1      Col2    Col3    Col4
# Team A    High    Yes     No
# Team A    Medium  Yes     Yes
# Team A    Medium  No      No
# Team B    Low     No      Yes
# Team B    Low     No      Yes
# Team B    Low     Yes     No
# Team C    High    No      No
# Team C    Medium  Yes    Yes
# Team C    Low     No     No 
# Team D    Medium  Yes    Yes
# Team D    Medium  Yes    No
# Team D    Medium  Yes    Yes

I want to use dplyr function to get the following result. Status_1 would need to be the count of the number of "Yes" in Col3 against each team while Status_2 would be the count of "Yes" in Col4 against each team

       High Medium  Low Status_1    Status_2
Team A    1      2    0        2           1
Team B    0      0    3        1           2
Team C    1      1    1        1           1
Team D    0      3    0        3           2

I am able to generate the normal summary but for the last two columns of "Status_1" and "Status_2" using the following statement. Can anyone help, please?

df %>%
  group_by(Col1, Col2) %>%
  summarise(Count = n()) %>%
  spread(Col1, Count, fill = 0)

Upvotes: 4

Views: 3003

Answers (2)

Darren Tsai
Darren Tsai

Reputation: 35554

First, group the data by col1 to count the number of Yes in col3 and col4. Then group again by all columns and count the number of observations in each group using n(). Finally, use tidyr::pivot_wider to transform data from long to wide.

df %>%
  group_by(col1) %>%
  mutate_at(vars(col3:col4), ~ sum(. == "Yes")) %>%
  rename(status_1 = col3, status_2 = col4) %>% 
  group_by_all %>%
  summarise(n = n()) %>%
  tidyr::pivot_wider(names_from = col2, values_from = n, values_fill = list(n = 0))

# # A tibble: 4 x 6
#   col1   status_1 status_2  High Medium   Low
#   <fct>     <int>    <int> <int>  <int> <int>
# 1 Team A        2        1     1      2     0
# 2 Team B        1        2     0      0     3
# 3 Team C        1        1     1      1     1
# 4 Team D        3        2     0      3     0

Upvotes: 3

JBGruber
JBGruber

Reputation: 12410

I would use use grepl and sum to simply count the matches:

df %>% 
  mutate_if(is.factor, as.character) %>% # your example data was sotred as factor
  group_by(col1) %>% 
  summarise(High = sum(grepl("High", col2)),
            Medium = sum(grepl("Medium", col2)),
            Low = sum(grepl("Low", col2)),
            Status_1 = sum(grepl("Yes", col3)),
            Status_2 = sum(grepl("Yes", col4)))
#> # A tibble: 4 x 6
#>   col1    High Medium   Low Status_1 Status_2
#>   <chr>  <int>  <int> <int>    <int>    <int>
#> 1 Team A     1      2     0        2        1
#> 2 Team B     0      0     3        1        2
#> 3 Team C     1      1     1        1        1
#> 4 Team D     0      3     0        3        2

Created on 2019-11-30 by the reprex package (v0.3.0)

Instead of grepl you can also use str_count or str_detect from stringr. All are doing the same in this case. The important thing is to use sum so counts are aggreagted to one value.

Upvotes: 2

Related Questions