mark
mark

Reputation: 95

Loop through rows and columns to calculate a compounded rate in a new DataFrame

I have been struggling to run a loop through each row and column. When looping through each row, I want to calculate a compound rate of return.

There are two different DataFrames (df1 and df2), where df1 shows stock symbols and df2 show their respective prices. I am trying to build a new DataFrame (df3) based on the 'if statements' listed below.

First DataFrame = df1

          Date     1         2         3       4      5 
  0 2000-12-05  PXX.TO    MX.TO    CAE.TO   HRX.TO  FR.TO
  1 2000-12-06  PXX.TO    MX.TO    CAE.TO   HRX.TO  FR.TO   
  2 2000-12-07  FTS.TO    MX.TO    CAE.TO   HRX.TO  FR.TO   
  3 2000-12-08  FTS.TO    MX.TO    CAE.TO   HRX.TO  FR.TO
  4 2000-12-09  FTS.TO    G.TO     CAE.TO   HRX.TO  TB.TO
  5 2000-12-10  FTS.TO    G.TO     KYU.TO   HRX.TO  TB.TO
  6 2000-12-11  FTS.TO    G.TO     KYU.TO   HRX.TO  TB.TO   
  7 2000-12-12  BAM-A.TO  G.TO     KYU.TO   HRX.TO  TB.TO   
  8 2000-12-13  BAM-A.TO  PLI.TO   KYU.TO   HRX.TO  TB.TO
  9 2000-12-14  BAM-A.TO  PLI.TO   KYU.TO   HRX.TO  TB.TO  
 10 2000-12-15  BAM-A.TO  PLI.TO   KYU.TO   HRX.TO  TB.TO

Second DataFrame = df2

          Date    1       2       3        4      5
  0 2000-12-05  2.3     60.10   2.30    34.98   35.00
  1 2000-12-06  2.35    60.70   2.38    35.43   35.01
  2 2000-12-07  56.76   61.31   2.46    35.89   35.02
  3 2000-12-08  57.33   61.92   2.54    36.35   35.04
  4 2000-12-09  57.90   100.20  2.63    36.83   300.90
  5 2000-12-10  58.48   101.00  69.56   37.30   304.18
  6 2000-12-11  59.07   101.81  70.46   37.78   307.50
  7 2000-12-12  4.50    102.62  71.37   38.27   310.85
  8 2000-12-13  4.54    44.50   72.29   38.77   314.24
  9 2000-12-14  4.57    45.39   73.23   39.27   317.66
 10 2000-12-15  4.61    46.30   74.18   39.78   321.12

Desired Output = df3

           Date     1      2       3       4      5
   0 2000-12-05 1.0000  1.0000  1.0000  1.0000  1.0000
   1 2000-12-06 1.0200  1.0100  1.0340  1.0129  1.0003
   2 2000-12-07 1.0200  1.0201  1.0692  1.0260  1.0007
   3 2000-12-08 1.0302  1.0303  1.1055  1.0393  1.0010
   4 2000-12-09 1.0405  1.0303  1.1431  1.0528  1.0010
   5 2000-12-10 1.0509  1.0385  1.1431  1.0664  1.0119
   6 2000-12-11 1.0614  1.0469  1.1579  1.0802  1.0230
   7 2000-12-12 1.0614  1.0552  1.1729  1.0941  1.0341
   8 2000-12-13 1.0699  1.0552  1.1880  1.1083  1.0454
   9 2000-12-14 1.0785  1.0763  1.2034  1.1226  1.0568
  10 2000-12-15 1.0871  1.0979  1.2190  1.1371  1.0683

Below show the formulas for the values in df3 for column 1

df3.row[0]  = 1
df3.row[1]  = (2.35/2.30) * 1 = 1.0200
df3.row[2]  = (56.76/56.76) * 1.0200 = 1.0200
df3.row[3]  = (57.33/56.76) * 1.0200 = 1.0302
df3.row[4]  = (57.90/57.33) * 1.0302 = 1.0405
df3.row[5]  = (58.48/57.90) * 1.0405 = 1.0509
df3.row[6]  = (59.07/58.48) * 1.0509 = 1.0614
df3.row[7]  = (4.50/4.50) * 1.0614 = 1.0614
df3.row[8]  = (4.54/4.50) * 1.0614 = 1.0699
df3.row[9]  = (4.57/4.54) * 1.0699 = 1.0785
df3.row[10] = (4.61/4.57) * 1.0785 = 1.0871

Below is what I have so far. Not confident that this is the best approach.

StartFromDay = 1
NumOfHoldings = 10
df3 = pd.DataFrame(columns = np.arange(1,NumOfHoldings+1))
df3.index.names = ['Date']

for col in df1.columns:

    #First row should equal 1
    df3.iloc[0][col] == 1

    for i in range(StartFromDay, len(df1)):

       #first row of each column
       prevrow = df1.iloc[0][col]
       if df1.iloc[i][col] == prevrow:
           ###### If Statements to calculate compound return#######

Upvotes: 0

Views: 730

Answers (1)

John Zwinck
John Zwinck

Reputation: 249462

Loops are slow, so we'll do it in a vectorized way. First, set the indexes appropriately:

df1.set_index('Date', inplace=True)
df2.set_index('Date', inplace=True) 

Next, generate a boolean mask which is True wherever the symbol is the same:

same_stock = df1.iloc[1:].values == df1.iloc[:-1].values

We have to use values because the shifted series are not aligned on the index anymore.

And make a matrix with all the df2.row[1]/df2.row[0] values:

ret = df2.iloc[1:].values / df2.iloc[:-1].values

Next, replace the returns where the symbol changed:

ret[~same_stock] = 1 # pretend return is flat when symbol changed

Now create a DataFrame with the result:

simpret = pd.DataFrame(np.vstack(([1,1,1,1,1], ret)), df1.index)
df3 = simpret.cumprod()

Upvotes: 1

Related Questions