Benson_YoureFired
Benson_YoureFired

Reputation: 434

Calculate aggregate value of column row by row

My apologies for the vague title, it's complicated to translate what I want in writing terms.

I'm trying to build a filled line chart with the date on x axis and total transaction over time on the y axis

My data

The object is a pandas dataframe.

date         | symbol | type | qty |   total
----------------------------------------------
2020-09-10    ABC       Buy     5      10     
2020-10-18    ABC       Buy     2      20      
2020-09-19    ABC       Sell    3      15    
2020-11-05    XYZ       Buy     10     8       
2020-12-03    XYZ       Buy     10     9      
2020-12-05    ABC       Buy     2      5     

What I whant

date         | symbol | type | qty |   total  | aggregate_total
------------------------------------------------------------
2020-09-10    ABC       Buy     5      10          10
2020-10-18    ABC       Buy     2      20         10+20 = 30
2020-09-19    ABC       Sell    3      15        10+20-15 = 15
2020-11-05    XYZ       Buy     10     8         8
2020-12-03    XYZ       Buy     10     9         8+9 = 17
2020-12-05    ABC       Buy     2      5          10+20-15+5 = 20

Where I am now

I'm working with 2 nested for loops : one for iterating over the symbols, one for iterating each row. I store the temporary results in lists. I'm still unsure how I will add the results to the final dataframe. I could reorder the dataframe by symbol and date, then append each temp lists together and finally assign that temp list to a new column.

The code below is just the inner loop over the rows.

af = df.loc[df['symbol'] == 'ABC']


 for i in (range(0,af.shape[0])):
       # print(af.iloc[0:i,[2,4]])

        # if type is a buy, we add the last operation to the aggregate
        if af.iloc[i,2] == "BUY":
            temp_agg_total.append(temp_agg_total[i] + af.iloc[i,4])
            temp_agg_qty.append(temp_agg_qty[i] + af.iloc[i, 3])

        else:
            temp_agg_total.append(temp_agg_total[i] - af.iloc[i,4])
            temp_agg_qty.append(temp_agg_qty[i] - af.iloc[i, 3])

        # Remove first element of list (0)
        temp_agg_total.pop(0)
        temp_agg_qty.pop(0)

    af = af.assign(agg_total = temp_agg_total,
              agg_qty = temp_agg_qty)

My question

Is there a better way to do this in pandas or numpy ? It feels really heavy for something relatively simple.

The presence of the Buy/Sell type of operation complicates things.

Regards

Upvotes: 0

Views: 117

Answers (3)

Chris
Chris

Reputation: 16147

# negate qty of Sells
df.loc[df['type']=='Sell', 'total'] *=-1
# cumulative sum of the qty based on symbol
df['aggregate_total'] = df.groupby('symbol')['total'].cumsum()

Upvotes: 2

ombk
ombk

Reputation: 2111

df["Type_num"] = df["type"].map({"Buy":1,"Sell":-1})
df["Num"] = df.Type_num*df.total
df.groupby(["symbol"],as_index=False)["Num"].cumsum()
pd.concat([df,df.groupby(["symbol"],as_index=False)["Num"].cumsum()],axis=1)

        date    symbol  type    qty   total Type_num    Num CumNum
0   2020-09-10   ABC     Buy     5     10      1        10   10
1   2020-10-18   ABC     Buy     2     20      1        20   30
2   2020-09-19   ABC     Sell    3     15     -1       -15   15
3   2020-11-05   XYZ     Buy     10    8       1        8    8
4   2020-12-03   XYZ     Buy     10    9       1        9    17
5   2020-12-05   ABC     Buy     2     5       1        5    20

The most important thing here is the cumulative sum. The regrouping is used to make sure that the cumulative sum is just performed on each kind of different symbol. The renaming and dropping of columns should be easy for you.

Trick is that I made {sell; buy} into {1,-1}

Upvotes: 1

Adán H Baena
Adán H Baena

Reputation: 66

Is this which you're looking for?

df['Agg'] = 1
df.loc[df['type'] == 'Sell', 'Agg'] = -1
df['Agg'] = df['Agg']*df['total']
df['Agg'].cumsum()

Upvotes: 1

Related Questions