Reputation: 421
I have a python dataframe that contains NHL Player data from multiple seasons. I'm trying to remove all rows of players that didn't play in 2018-2019. For example if Joe Jones played in 2018-2019, I want to keep his data from that season, and any other season he has played in.
I'm thinking the code would look something like this:
for player in data.players:
if data['Year'] == '2018-2019':
save player's name
else:
remove player's data
For example, my dataframe looks like this
Year Player TM GP
2018-2019 Joe MTL 78
2017-2018 Joe MTL 82
2016-2017 Joe MTL 80
2017-2018 Jim STL 76
2016-2017 Jim STL 82
2018-2019 Jack MIN 82
The result would be:
Year Player TM GP
2018-2019 Joe MTL 78
2017-2018 Joe MTL 82
2016-2017 Joe MTL 80
2018-2019 Jack MIN 82
Upvotes: 3
Views: 85
Reputation: 294506
groupby.filter
df.groupby('Player').filter(lambda d: '2018-2019' in {*d.Year})
Year Player TM GP
0 2018-2019 Joe MTL 78
1 2017-2018 Joe MTL 82
2 2016-2017 Joe MTL 80
5 2018-2019 Jack MIN 82
Same thing but use the values
array instead of set
df.groupby('Player').filter(lambda d: '2018-2019' in d.Year.values)
m = df.Year.values == '2018-2019'
i, u = pd.factorize(df.Player)
a = np.zeros(len(u), bool)
np.logical_or.at(a, i, m)
df[a[i]]
Year Player TM GP
0 2018-2019 Joe MTL 78
1 2017-2018 Joe MTL 82
2 2016-2017 Joe MTL 80
5 2018-2019 Jack MIN 82
Upvotes: 5