ds882
ds882

Reputation: 105

Dataframe cell to be locked and used for a running balance calculation conditional of result on another cell on same row

Say I have the following dataframe:

import pandas as pd
df = pd.DataFrame()

df['A'] = ('1/05/2019','2/05/2019','3/05/2019','4/05/2019','5/05/2019','6/05/2019','7/05/2019','8/05/2019','9/05/2019','10/05/2019','11/05/2019','12/05/2019','13/05/2019','14/05/2019','15/05/2019','16/05/2019','17/05/2019','18/05/2019','19/05/2019','20/05/2019')

df['B'] = ('SIT','SCLOSE', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SELL','SIT','SIT','BCLOSE', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL','BUY','SIT','SIT')

df['C'] = (0.00,1.00,10.00, 5.00,6.00,-6.00, 6.00, 0.00,0.00,0.00,-8.00,33.00,-15.00,6.00,-1.00,5.00,10.00,0.00,0.00,0.00)

df.loc[19, 'D'] = 100.0000

As can be seen I am starting column D with 100 at the last row.

I am trying to code a calculation for column D so starting from the bottom row (row 19) when a BUY or SELL is shown on column B then the number on column D is locked (eg the 100) and used for a calculation based on col C for each SHODL or BHODL until the row after a BCLOSE or an SCLOSE is shown.

The locked number is used to calculate a running balance based on the percentages that are in column C. As you can see on row 16 column C has '10' representing 10%. As 10% of 100 = 10 the new runnning balance is 110.

Row 15 column C has 5% as such 5 is added to the running balance to result in 115.

The next row 14 column C has a -1% change as such 1% of 100 is = 1 and therefore the new running balance is 114 and so on.

The following are the results that should be returned in col D of the dataframe once the right code is run

df['D'] = ('158.60','158.60', '157.30', '144.30', '137.80', '130.00', '137.80', '130.00','130.00','130.00','130.00', '138.00', '105.00', '120.00', '114.00', '115.00', '110.00','100.00','100.00','100.00')

This continues until after a SCLOSE or a BCLOSE is shown as a BCLOSE or SCLOSE row is the final row where the running balance is calculated.

As you can see this process is restarted when either a new BUY or SELL is shown.

Upvotes: 6

Views: 781

Answers (3)

Arn
Arn

Reputation: 2015

This piece below should help you out. It produces the expected output and it's relatively fast because it avoids the direct iteration on the dataframe's rows.

endpoints = [df.first_valid_index(), df.last_valid_index()]
# occurrences of 'BCLOSE' or 'SCLOSE'
breakpoints = df.index[(df.B =='BCLOSE') | (df.B == 'SCLOSE')][::-1]
# remove the endpoints of the dataframe that do not break the structure
breakpoints = breakpoints.drop(endpoints, errors='ignore')

PERCENTAGE_CONST = 100
top = 100  # you can specify any initial value here

for i in range(len(breakpoints) + 1):

    prv = breakpoints[i - 1] - 1 if i else -1  # previous or first breakpoint
    try:
        nex = breakpoints[i] - 1  # next breakpoint
    except IndexError:
        nex = None  # last breakpoint

    # cumulative sum of appended to 'D' column
    res = top + (df['C'][prv: nex: -1] * top / PERCENTAGE_CONST).cumsum()[::-1]
    df.loc[res.index, 'D'] = res

    # saving the value that will be the basis for percentage calculations
    # for the next breakpoint
    top = res.iloc[0]

Upvotes: 0

Andy L.
Andy L.

Reputation: 25259

Next starting value depends on the last value of previous group, so I think it can't be vectorized. It requires some kind of iterative process. I came up with solution doing iteratively on groups of groupby. Reverse df and assign to df1. Working on each group of df1 and assign the final list of groups to the original df

df1 = df[::-1]
s = df1.B.isin(['BCLOSE','SCLOSE']).shift(fill_value=False).cumsum()
grps = df1.groupby(s)
init_val= 100
l = []
for _, grp in grps:
    s = grp.C * 0.01 * init_val
    s.iloc[0] = init_val
    s = s.cumsum()
    init_val = s.iloc[-1]
    l.append(s)

df['D'] = pd.concat(l)

Out[50]:
             A       B     C      D
0    1/05/2019     SIT   0.0  158.6
1    2/05/2019  SCLOSE   1.0  158.6
2    3/05/2019   SHODL  10.0  157.3
3    4/05/2019   SHODL   5.0  144.3
4    5/05/2019   SHODL   6.0  137.8
5    6/05/2019   SHODL  -6.0  130.0
6    7/05/2019   SHODL   6.0  137.8
7    8/05/2019    SELL   0.0  130.0
8    9/05/2019     SIT   0.0  130.0
9   10/05/2019     SIT   0.0  130.0
10  11/05/2019  BCLOSE  -8.0  130.0
11  12/05/2019   BHODL  33.0  138.0
12  13/05/2019   BHODL -15.0  105.0
13  14/05/2019   BHODL   6.0  120.0
14  15/05/2019   BHODL  -1.0  114.0
15  16/05/2019   BHODL   5.0  115.0
16  17/05/2019   BHODL  10.0  110.0
17  18/05/2019     BUY   0.0  100.0
18  19/05/2019     SIT   0.0  100.0
19  20/05/2019     SIT   0.0  100.0        

Upvotes: 2

nishant
nishant

Reputation: 925

I think there is a more optimized and pythonic way to solve this. But a solution with iteration:

df['D'] = pd.to_numeric(df['D'])
df['C'] = pd.to_numeric(df['C'])
D_val = None
for i in range(len(df)-1, 0, -1):
    if df.loc[i, 'B'] == 'BUY':
        D_val = df.loc[i, 'D']
        continue
    if D_val is None:
        continue
    df.loc[i, 'D'] = df.loc[i+1, 'D'] + (D_val * df.loc[i, 'C']/100)

Everytime you encounter a BUY in column D, you update the D_val. We can also have a condition to stop as mentioned by OP such as SCLOSE or BCLOSE.

Upvotes: 0

Related Questions