Caitlin Luo
Caitlin Luo

Reputation: 55

Why won't my group_by function combine data of the same variable name?

Below is a reprex copy of my code, I'm trying to group the below data set by the sport type, however when I use the group__by function variables with the same sport type aren't grouped together. For example below all the sport type 'All track combined' aren't grouped in a single row.

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.1.3
#> Warning: package 'ggplot2' was built under R version 4.1.3
#> Warning: package 'tibble' was built under R version 4.1.3
#> Warning: package 'dplyr' was built under R version 4.1.3
install.packages("tidytuesdayR")
#> Installing package into 'C:/Users/caitl/OneDrive/Documents/R/win-library/4.1'
#> (as 'lib' is unspecified)
#> package 'tidytuesdayR' successfully unpacked and MD5 sums checked
#> 
#> The downloaded binary packages are in
#>  C:\Users\caitl\AppData\Local\Temp\RtmpsXMLIf\downloaded_packages
tuesdata <- tidytuesdayR::tt_load('2022-03-29')
#> --- Compiling #TidyTuesday Information for 2022-03-29 ----
#> --- There is 1 file available ---
#> --- Starting Download ---
#> 
#>  Downloading file 1 of 1: `sports.csv`
#> --- Download complete ---
tuesdata$sports %>% 
  dplyr::group_by(sports) %>%
  dplyr::summarise(sports = sports, prop = (partic_men)/(partic_men + partic_women)) %>%
  na.omit() 
#> `summarise()` has grouped output by 'sports'. You can override using the
#> `.groups` argument.
#> # A tibble: 43,614 x 2
#> # Groups:   sports [31]
#>    sports              prop
#>    <chr>              <dbl>
#>  1 All Track Combined 0.570
#>  2 All Track Combined 0.556
#>  3 All Track Combined 0.513
#>  4 All Track Combined 0.494
#>  5 All Track Combined 0.450
#>  6 All Track Combined 0.567
#>  7 All Track Combined 0.478
#>  8 All Track Combined 0.464
#>  9 All Track Combined 0.492
#> 10 All Track Combined 0.512
#> # ... with 43,604 more rows

Upvotes: 1

Views: 128

Answers (1)

Julian
Julian

Reputation: 9240

Is this what you want?

tuesdata$sports %>% 
  dplyr::group_by(sports) %>%
  dplyr::summarise(prop = (sum(partic_men,
                               na.rm =TRUE)/(sum(partic_men, na.rm = TRUE) + 
                                               sum(partic_women, na.rm = TRUE))))

Output:

  sports                 prop
   <chr>                 <dbl>
 1 All Track Combined 0.485   
 2 Archery            0.439   
 3 Badminton          0       
 4 Baseball           1       
 5 Basketball         0.536   
 6 Beach Volleyball   0.0192  
 7 Bowling            0.402   
 8 Diving             0.468   
 9 Equestrian         0.000675
10 Fencing            0.480   

Upvotes: 1

Related Questions