Metsfan
Metsfan

Reputation: 520

In R need count of consecutive years played

I need to know for each player who played for the New York Mets (NYM) how many consecutive years they played, starting with the year they first joined the team.

Sample Input

Sample Input

In the sample output below, there is an error I need fixed. Player C started playing for the Mets in 1984 (see Sample Input above), but his first stint with the team ended in 1987 as he did not play for them in 1988, so though the Sample Output shows it as 5 consecutive years, it should be 4.

I would prefer a dplyr solution but am open to others.

I checked the available stackoverflow solutions, but could not find one that directly addressed my situation. If one exists, I would appreciate it if you could share its link.

Sample Output

Sample Output

Here is my code so far:

cym |>
  filter(Team == "NYM") |>
  group_by(Player) |>
  count(name = "Consec_Yrs")

Dput(cym):

structure(list(Player = c("A", "A", "A", "A", "B", "B", "B", 
"B", "C", "C", "C", "C", "C", "C", "C", "C", "D"), Team = c("NYM", 
"NYM", "LAD", "LAD", "SFG", "NYM", "NYM", "NYM", "PHA", "NYM", 
"NYM", "NYM", "NYM", "ATL", "ATL", "NYM", "NYM"), Year = c(1970, 
1971, 1972, 1973, 1980, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 
1987, 1987, 1988, 1989, 2000)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -17L))

Upvotes: 0

Views: 76

Answers (2)

PaulS
PaulS

Reputation: 25383

Another possible solution:

library(dplyr)

df %>% 
  group_by(Player, Team) %>% 
  mutate(aux = cumsum(c(1, diff(Year) == 1))) %>% 
  summarise(Consec_yrs = max(aux), .groups = "drop")

#> # A tibble: 8 × 3
#>   Player Team  Consec_yrs
#>   <chr>  <chr>      <dbl>
#> 1 A      LAD            2
#> 2 A      NYM            2
#> 3 B      NYM            3
#> 4 B      SFG            1
#> 5 C      ATL            2
#> 6 C      NYM            4
#> 7 C      PHA            1
#> 8 D      NYM            1

Upvotes: 2

akrun
akrun

Reputation: 887571

We may need to create a grouping based on the consecutive years

library(dplyr)
out <- cym %>%
   group_by(Team, Player) %>%
   mutate(grpYear = cumsum(c(TRUE, diff(Year) != 1))) %>% 
   group_by(grpYear, .add = TRUE)  %>%
   count(name = "Consec_Yrs") %>%
   ungroup

-output

> out %>% filter(Team == "NYM")
# A tibble: 5 × 4
  Team  Player grpYear Consec_Yrs
  <chr> <chr>    <int>      <int>
1 NYM   A            1          2
2 NYM   B            1          3
3 NYM   C            1          4
4 NYM   C            2          1
5 NYM   D            1          1

If we want only a single max consecutive count per group, then

out %>% 
  group_by(Team, Player) %>%
  slice_max(n = 1, order_by = Consec_Yrs) %>%
  ungroup
# A tibble: 8 × 4
  Team  Player grpYear Consec_Yrs
  <chr> <chr>    <int>      <int>
1 ATL   C            1          2
2 LAD   A            1          2
3 NYM   A            1          2
4 NYM   B            1          3
5 NYM   C            1          4
6 NYM   D            1          1
7 PHA   C            1          1
8 SFG   B            1          1

Upvotes: 1

Related Questions