Jazzmatazz
Jazzmatazz

Reputation: 645

Group by DF and then Filter using dplyr

This might be relatively easy in dplyr. Sample question uses the Lahman package data.

What player managed both the NYA and NYN under teamID?

# get master player table
players <- Lahman::People

# get manager table
managers <- Lahman::Managers

# merge players to managers
manager_tbl <- 
  managers %>% 
  left_join(players)

I want to get the results for the players under playerID that have a row for both NYA and NYN under teamID.

How would I go about doing this? I'm guessing that I would need to group at playerID. berrayo01 is one of the answers.

Upvotes: 2

Views: 51

Answers (1)

akrun
akrun

Reputation: 887891

After grouping by 'playerID', filter all groups having both 'NYA' and 'NYN' %in% 'teamID'

library(dplyr)
manager_tbl %>%
        group_by(playerID) %>%
         filter(all(c("NYA", "NYN") %in% teamID))
# A tibble: 69 x 35
# Groups:   playerID [4]
#   playerID yearID teamID lgID  inseason     G     W     L  rank plyrMgr birthYear birthMonth birthDay birthCountry birthState birthCity deathYear deathMonth deathDay deathCountry deathState
#   <chr>     <int> <fct>  <fct>    <int> <int> <int> <int> <int> <fct>       <int>      <int>    <int> <chr>        <chr>      <chr>         <int>      <int>    <int> <chr>        <chr>     
# 1 stengca…   1934 BRO    NL           1   153    71    81     6 N            1890          7       30 USA          MO         Kansas C…      1975          9       29 USA          CA        
# 2 stengca…   1935 BRO    NL           1   154    70    83     5 N            1890          7       30 USA          MO         Kansas C…      1975          9       29 USA          CA        
# 3 stengca…   1936 BRO    NL           1   156    67    87     7 N            1890          7       30 USA          MO         Kansas C…      1975          9       29 USA          CA        
# 4 stengca…   1938 BSN    NL           1   153    77    75     5 N            1890          7       30 USA          MO         Kansas C…      1975          9       29 USA          CA        
# 5 stengca…   1939 BSN    NL           1   152    63    88     7 N            1890          7       30 USA          MO         Kansas C…      1975          9       29 USA          CA        
# 6 stengca…   1940 BSN    NL           1   152    65    87     7 N            1890          7       30 USA          MO         Kansas C…      1975          9       29 USA          CA        
# 7 stengca…   1941 BSN    NL           1   156    62    92     7 N            1890          7       30 USA          MO         Kansas C…      1975          9       29 USA          CA        
# 8 stengca…   1942 BSN    NL           1   150    59    89     7 N            1890          7       30 USA          MO         Kansas C…      1975          9       29 USA          CA        
# 9 stengca…   1943 BSN    NL           2   107    47    60     6 N            1890          7       30 USA          MO         Kansas C…      1975          9       29 USA          CA        
#10 stengca…   1949 NYA    AL           1   155    97    57     1 N            1890          7       30 USA          MO         Kansas C…      1975          9       29 USA          CA        
# … with 59 more rows, and 14 more variables: deathCity <chr>, nameFirst <chr>, nameLast <chr>, nameGiven <chr>, weight <int>, height <int>, bats <fct>, throws <fct>, debut <chr>,
#   finalGame <chr>, retroID <chr>, bbrefID <chr>, deathDate <date>, birthDate <date>

Upvotes: 2

Related Questions