user16400720
user16400720

Reputation:

pandas calculation grouping by ID

Is it possible to apply a calculation to a DF by ID. For example if multiple records share a similar ID make that calculation but do it throughout the DF?

df['Total Volume'] = df.groupby('ID')[df['VOLUME'].shift(-1) + df['ADDED'] - df['VOLUME']]

So I want to create the column "Total Volume" using the calculation

df['VOLUME'].shift(-1) + df['ADDED'] - df['VOLUME']

I've had some success with this logic:

df['Total Volume'] = df['VOLUME'].shift(-1) + df['ADDED'] - df['VOLUME']

But it doesn't take into account for the ID and the first record is always a total from another ID.

Ultimately the column should be blank or empty for the first record with every new ID.

Upvotes: 0

Views: 252

Answers (1)

user7864386
user7864386

Reputation:

IIUC, you want to restrict the calculations to each "ID", right? Then you could use groupby + shift on "VOLUME" and do everything else the same.

df['TOTAL VOLUME'] = df.groupby('ID')['VOLUME'].shift(-1) + df['ADDED'] - df['VOLUME']

Note that shift(-1) moves the next record up, so the end result is the last record in each "ID" is empty. Since you say:

Ultimately the column should be blank or empty for the first record with every new ID

I think you want shift(1) instead (by default it's 1):

df['TOTAL VOLUME'] = df.groupby('ID')['VOLUME'].shift() + df['ADDED'] - df['VOLUME']

Then again, the above code can be written even simpler using groupby + diff:

df['TOTAL VOLUME'] = df['ADDED'] - df.groupby('ID')['VOLUME'].diff()

Upvotes: 1

Related Questions