Jeremy414
Jeremy414

Reputation: 41

Data processing within GroupBy objects. How to add columns?

I'd like to take every pitcher's last pitch per game in an MLB dataset and mark it as True. However, I'm having trouble adding columns or modifying the dataframe within a GroupBy object. How could I effectively add this column?

data['last_pitch'] = False
g = data.groupby(['gameString', 'pitcherId'])
for x, pitcher in g:
    pitcher.iloc[-1]['last_pitch'] = True

Upvotes: 2

Views: 65

Answers (2)

jpp
jpp

Reputation: 164773

It's tempting to use GroupBy for this. However, there are often alternative methods when you aren't looking to actually aggregate data. Here, you can use pd.Series.duplicated with keep='last':

# data from gyoza

df['last_pitch'] = ~df['pitcherId'].duplicated(keep='last')

print(df)

  gameString pitcherId  last_pitch
0          a         c       False
1          a         c        True
2          b         d       False
3          b         d       False
4          b         d        True

If you really wish to use GroupBy, you can use the last method:

idx = df.reset_index().groupby('pitcherId')['index'].last().values

df['last_pitch'] = df.index.isin(idx)

Upvotes: 1

Shaido
Shaido

Reputation: 28367

One way is to find all the indices of the rows that you want to change with tail and then use loc to change them in the original dataframe:

last_rows = data.groupby(['gameString', 'pitcherId']).tail(n=1)
data.loc[last_rows.index, 'last_pitch'] = True

Upvotes: 0

Related Questions