Reputation: 71
Suppose I am having the following data. [As requested I am adding the data]
col1 <- c("Team A", "Team A", "Team A", "Team B", "Team B", "Team B", "Team C", "Team C", "Team C", "Team D", "Team D", "Team D")
col2 <- c("High", "Medium", "Medium", "Low", "Low", "Low", "High", "Medium", "Low", "Medium", "Medium", "Medium")
col3 <- c("Yes", "Yes", "No", "No", "No", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes")
col4 <- c("No", "Yes", "No", "Yes", "Yes", "No", "No", "Yes", "No", "Yes", "No", "Yes")
df <- data.frame(col1, col2, col3, col4)
# Col1 Col2 Col3 Col4
# Team A High Yes No
# Team A Medium Yes Yes
# Team A Medium No No
# Team B Low No Yes
# Team B Low No Yes
# Team B Low Yes No
# Team C High No No
# Team C Medium Yes Yes
# Team C Low No No
# Team D Medium Yes Yes
# Team D Medium Yes No
# Team D Medium Yes Yes
I want to use dplyr
function to get the following result. Status_1 would need to be the count of the number of "Yes" in Col3 against each team while Status_2 would be the count of "Yes" in Col4 against each team
High Medium Low Status_1 Status_2
Team A 1 2 0 2 1
Team B 0 0 3 1 2
Team C 1 1 1 1 1
Team D 0 3 0 3 2
I am able to generate the normal summary but for the last two columns of "Status_1" and "Status_2" using the following statement. Can anyone help, please?
df %>%
group_by(Col1, Col2) %>%
summarise(Count = n()) %>%
spread(Col1, Count, fill = 0)
Upvotes: 4
Views: 3003
Reputation: 35554
First, group the data by col1
to count the number of Yes
in col3
and col4
. Then group again by all columns and count the number of observations in each group using n()
. Finally, use tidyr::pivot_wider
to transform data from long to wide.
df %>%
group_by(col1) %>%
mutate_at(vars(col3:col4), ~ sum(. == "Yes")) %>%
rename(status_1 = col3, status_2 = col4) %>%
group_by_all %>%
summarise(n = n()) %>%
tidyr::pivot_wider(names_from = col2, values_from = n, values_fill = list(n = 0))
# # A tibble: 4 x 6
# col1 status_1 status_2 High Medium Low
# <fct> <int> <int> <int> <int> <int>
# 1 Team A 2 1 1 2 0
# 2 Team B 1 2 0 0 3
# 3 Team C 1 1 1 1 1
# 4 Team D 3 2 0 3 0
Upvotes: 3
Reputation: 12410
I would use use grepl
and sum
to simply count the matches:
df %>%
mutate_if(is.factor, as.character) %>% # your example data was sotred as factor
group_by(col1) %>%
summarise(High = sum(grepl("High", col2)),
Medium = sum(grepl("Medium", col2)),
Low = sum(grepl("Low", col2)),
Status_1 = sum(grepl("Yes", col3)),
Status_2 = sum(grepl("Yes", col4)))
#> # A tibble: 4 x 6
#> col1 High Medium Low Status_1 Status_2
#> <chr> <int> <int> <int> <int> <int>
#> 1 Team A 1 2 0 2 1
#> 2 Team B 0 0 3 1 2
#> 3 Team C 1 1 1 1 1
#> 4 Team D 0 3 0 3 2
Created on 2019-11-30 by the reprex package (v0.3.0)
Instead of grepl
you can also use str_count
or str_detect
from stringr
. All are doing the same in this case. The important thing is to use sum
so counts are aggreagted to one value.
Upvotes: 2