Beginner
Beginner

Reputation: 23

Calculate consecutive winning streak based on years

This is about athletes competing in the olympia. I'm supposed to calculate the top 10 athletes that held a medal for the longest time.

for example: won in 2004, 2008, 2012 --> therefore the athlete won 3 times in a row.

I'm just learning about R and I'm loosing my mind over it.

I don't even know where to start with solving this problem.

My data is "cleaned" as far as possible: - only athletes that won a gold medal - get the actual year they've won out of string

My columns (after cleaning)

id    name          team        year    medal
1     john doe      USA         2004    gold
1     john doe      USA         2008    gold
1     john doe      USA         2012    gold
2     marc twain    GER         2016    gold
3     edgar poe     FIN         2000    gold
3     edgar poe     FIN         2008    gold

I've tried some things like:

mutate(won =
           if_else(condition = year == year +4,
                   true = "won",
                   false = "lost"))

or something like

mutate(won =
           if_else(
             condition = (year + 4) == tmp_year,
             true = "Following Year",
             false = if_else(
               condition = year == tmp_year,
               true = "Actual year",
               false = "No")))

Here I only get Actual Year and No as answer.

In the end, i want a table that shows me how many times an athelte won the gold medal in a row.

So for example data set it would be something like this:

id    name          won        
1     john doe      3
2     marc twain    1
3     edgar poe     1

Edit: I'm not looking for a complete answer, more like inspiration: what functions could be interesting to look at.

Upvotes: 2

Views: 458

Answers (2)

A. Suliman
A. Suliman

Reputation: 13135

Here is one option using cumsum and dplyr::lead with default equal to year+4 taking into account where a player can have more than one series of medals

library(dplyr)
df %>% group_by(id) %>% 
       mutate(flag=lead(year,default = last(year)+4)-year, won=cumsum(flag==4)) %>% 
       select(-flag) %>% slice(which.max(won))

# A tibble: 3 x 6
# Groups:   id [3]
       id name       team   year medal   won
    <int> <chr>      <chr> <int> <chr> <int>
  1     1 john doe   USA    2012 gold      3
  2     2 marc twain GER    2016 gold      1
  3     3 edgar poe  FIN    2008 gold      1

Update by @akrun

This can be done in a compact way with

df %>% group_by(id, name, team) %>% 
       mutate(yearlead = lead(year, default = year[n()]+4), yeardiff = yearlead - year) %>% 
       group_by( grp = rleid(case_when(yeardiff == 4 ~ as.integer(yeardiff), TRUE ~ row_number())), add = TRUE) %>% 
       summarise(n = n())

# A tibble: 4 x 5
# Groups:   id, name, team [?]
  id name       team    grp     n
  <int> <chr>      <chr> <int> <int>
  1     1 john doe   USA       1     3
  2     2 marc twain GER       1     1
  3     3 edgar poe  FIN       1     1
  4     3 edgar poe  FIN       2     1

Data (This data different from the OP dataset)

df <- structure(list(id = c(1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 3L), name = c("john doe", "john doe", "john doe", "marc twain", "edgar poe", "edgar poe", "edgar poe", "edgar poe", "edgar poe"), 
       team = c("USA", "USA", "USA", "GER", "FIN", "FIN", "FIN", "FIN", "FIN"), year = c(2004L, 2008L, 2012L, 2016L, 2000L, 2008L, 2016L, 2020L, 2024L), medal = c("gold", "gold", "gold", "gold", "gold", "gold", "gold", "gold", "gold" )), class = "data.frame", row.names = c(NA, -9L))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389225

Using dplyr we can calculate the difference in winning years of gold medals using diff for each name, then group_by name and the difference and calculate the consecutive winnings.

library(dplyr)

df %>%
 group_by(name) %>%
 mutate(diff = c(4,diff(year))) %>%
 group_by(name, diff) %>%
 summarise(count = n()) %>%
 select(-diff)


#    name      count
#   <fct>     <int>
#1 edgarpoe      1
#2 edgarpoe      1
#3 johndoe       3
#4 marctwain     1

Upvotes: 1

Related Questions