Reputation: 41
I'd like to take every pitcher's last pitch per game in an MLB dataset and mark it as True. However, I'm having trouble adding columns or modifying the dataframe within a GroupBy object. How could I effectively add this column?
data['last_pitch'] = False
g = data.groupby(['gameString', 'pitcherId'])
for x, pitcher in g:
pitcher.iloc[-1]['last_pitch'] = True
Upvotes: 2
Views: 65
Reputation: 164773
It's tempting to use GroupBy
for this. However, there are often alternative methods when you aren't looking to actually aggregate data. Here, you can use pd.Series.duplicated
with keep='last'
:
# data from gyoza
df['last_pitch'] = ~df['pitcherId'].duplicated(keep='last')
print(df)
gameString pitcherId last_pitch
0 a c False
1 a c True
2 b d False
3 b d False
4 b d True
If you really wish to use GroupBy
, you can use the last
method:
idx = df.reset_index().groupby('pitcherId')['index'].last().values
df['last_pitch'] = df.index.isin(idx)
Upvotes: 1
Reputation: 28367
One way is to find all the indices of the rows that you want to change with tail
and then use loc
to change them in the original dataframe:
last_rows = data.groupby(['gameString', 'pitcherId']).tail(n=1)
data.loc[last_rows.index, 'last_pitch'] = True
Upvotes: 0