TvCasteren
TvCasteren

Reputation: 425

Summarizing a specific column with dplyr

For my assignment I need to create an object which contains, for each combination of Sex and Season, the number of different sports in the olympics data set. The columns of this object should be called Competitor_Sex, Olympic_Season, and Num_Sports, respectively.

This is what I have at the moment:

object <- olympics %>%
  group_by(Sex, Season) %>%
  summarise(Num_Sports = ???)

I'm having trouble with defining the third column, which is the number of sports. My data looks like this:

structure(list(Name = c("A Lamusi", "Juhamatti Tapio Aaltonen", 
"Andreea Aanei", "Jamale (Djamel-) Aarrass (Ahrass-)", "Nstor Abad Sanjun"
), Sex = c("M", "M", "F", "M", "M"), Age = c(23L, 28L, 22L, 30L, 
23L), Height = c(170L, 184L, 170L, 187L, 167L), Weight = c(60, 
85, 125, 76, 64), Team = c("China", "Finland", "Romania", "France", 
"Spain"), NOC = c("CHN", "FIN", "ROU", "FRA", "ESP"), Games = c("2012 Summer", 
"2014 Winter", "2016 Summer", "2012 Summer", "2016 Summer"), 
    Year = c(2012L, 2014L, 2016L, 2012L, 2016L), Season = c("Summer", 
    "Winter", "Summer", "Summer", "Summer"), City = c("London", 
    "Sochi", "Rio de Janeiro", "London", "Rio de Janeiro"), Sport = c("Judo", 
    "Ice Hockey", "Weightlifting", "Athletics", "Gymnastics"), 
    Event = c("Judo Men's Extra-Lightweight", "Ice Hockey Men's Ice Hockey", 
    "Weightlifting Women's Super-Heavyweight", "Athletics Men's 1,500 metres", 
    "Gymnastics Men's Individual All-Around"), Medal = c(NA, 
    "Bronze", NA, NA, NA)), row.names = c("1", "2", "3", "4", 
"5"), class = "data.frame")

This is probably solved in an easy way. Could someone help me? Would be appreciated a lot!

Best Regards,

Upvotes: 0

Views: 228

Answers (2)

llrs
llrs

Reputation: 3397

You can use the equivalent of length(unique( from dplyr: n_distinct:

olympics %>% 
  group_by(Sex, Season) %>% 
  summarise(Sports = n_distinct(Sport)) %>%
  rename(Competitor_Sex = Sex, Olympic_Season = Season) # To rename the columns

Upvotes: 1

Alex
Alex

Reputation: 4995

Grouping twice should work:

olympics %>% 
  group_by(Sex, Season, Sport) %>% 
  summarise(n()) %>% 
  group_by(Sex, Season) %>%
  summarise(n())

Upvotes: 1

Related Questions