Calculate consecutive winning streak based on years

Question

This is about athletes competing in the olympia. I'm supposed to calculate the top 10 athletes that held a medal for the longest time.

for example: won in 2004, 2008, 2012 --> therefore the athlete won 3 times in a row.

I'm just learning about R and I'm loosing my mind over it.

I don't even know where to start with solving this problem.

My data is "cleaned" as far as possible: - only athletes that won a gold medal - get the actual year they've won out of string

My columns (after cleaning)

id    name          team        year    medal
1     john doe      USA         2004    gold
1     john doe      USA         2008    gold
1     john doe      USA         2012    gold
2     marc twain    GER         2016    gold
3     edgar poe     FIN         2000    gold
3     edgar poe     FIN         2008    gold

I've tried some things like:

mutate(won =
           if_else(condition = year == year +4,
                   true = "won",
                   false = "lost"))

or something like

mutate(won =
           if_else(
             condition = (year + 4) == tmp_year,
             true = "Following Year",
             false = if_else(
               condition = year == tmp_year,
               true = "Actual year",
               false = "No")))

Here I only get Actual Year and No as answer.

In the end, i want a table that shows me how many times an athelte won the gold medal in a row.

So for example data set it would be something like this:

id    name          won        
1     john doe      3
2     marc twain    1
3     edgar poe     1

Edit: I'm not looking for a complete answer, more like inspiration: what functions could be interesting to look at.

Ronak Shah · Accepted Answer

Using dplyr we can calculate the difference in winning years of gold medals using diff for each name, then group_by name and the difference and calculate the consecutive winnings.

library(dplyr)

df %>%
 group_by(name) %>%
 mutate(diff = c(4,diff(year))) %>%
 group_by(name, diff) %>%
 summarise(count = n()) %>%
 select(-diff)


#    name      count
#        
#1 edgarpoe      1
#2 edgarpoe      1
#3 johndoe       3
#4 marctwain     1

Calculate consecutive winning streak based on years

Answers (2)

Update by @akrun

Related Questions