pandas calculation grouping by ID

Question

Is it possible to apply a calculation to a DF by ID. For example if multiple records share a similar ID make that calculation but do it throughout the DF?

df['Total Volume'] = df.groupby('ID')[df['VOLUME'].shift(-1) + df['ADDED'] - df['VOLUME']]

So I want to create the column "Total Volume" using the calculation

df['VOLUME'].shift(-1) + df['ADDED'] - df['VOLUME']

I've had some success with this logic:

df['Total Volume'] = df['VOLUME'].shift(-1) + df['ADDED'] - df['VOLUME']

But it doesn't take into account for the ID and the first record is always a total from another ID.

Ultimately the column should be blank or empty for the first record with every new ID.

user7864386 · Accepted Answer

IIUC, you want to restrict the calculations to each "ID", right? Then you could use groupby + shift on "VOLUME" and do everything else the same.

df['TOTAL VOLUME'] = df.groupby('ID')['VOLUME'].shift(-1) + df['ADDED'] - df['VOLUME']

Note that shift(-1) moves the next record up, so the end result is the last record in each "ID" is empty. Since you say:

Ultimately the column should be blank or empty for the first record with every new ID

I think you want shift(1) instead (by default it's 1):

df['TOTAL VOLUME'] = df.groupby('ID')['VOLUME'].shift() + df['ADDED'] - df['VOLUME']

Then again, the above code can be written even simpler using groupby + diff:

df['TOTAL VOLUME'] = df['ADDED'] - df.groupby('ID')['VOLUME'].diff()

pandas calculation grouping by ID

Answers (1)

Related Questions