Reputation: 95
I have been struggling to run a loop through each row and column. When looping through each row, I want to calculate a compound rate of return.
There are two different DataFrames (df1 and df2), where df1 shows stock symbols and df2 show their respective prices. I am trying to build a new DataFrame (df3) based on the 'if statements' listed below.
First DataFrame = df1
Date 1 2 3 4 5
0 2000-12-05 PXX.TO MX.TO CAE.TO HRX.TO FR.TO
1 2000-12-06 PXX.TO MX.TO CAE.TO HRX.TO FR.TO
2 2000-12-07 FTS.TO MX.TO CAE.TO HRX.TO FR.TO
3 2000-12-08 FTS.TO MX.TO CAE.TO HRX.TO FR.TO
4 2000-12-09 FTS.TO G.TO CAE.TO HRX.TO TB.TO
5 2000-12-10 FTS.TO G.TO KYU.TO HRX.TO TB.TO
6 2000-12-11 FTS.TO G.TO KYU.TO HRX.TO TB.TO
7 2000-12-12 BAM-A.TO G.TO KYU.TO HRX.TO TB.TO
8 2000-12-13 BAM-A.TO PLI.TO KYU.TO HRX.TO TB.TO
9 2000-12-14 BAM-A.TO PLI.TO KYU.TO HRX.TO TB.TO
10 2000-12-15 BAM-A.TO PLI.TO KYU.TO HRX.TO TB.TO
Second DataFrame = df2
Date 1 2 3 4 5
0 2000-12-05 2.3 60.10 2.30 34.98 35.00
1 2000-12-06 2.35 60.70 2.38 35.43 35.01
2 2000-12-07 56.76 61.31 2.46 35.89 35.02
3 2000-12-08 57.33 61.92 2.54 36.35 35.04
4 2000-12-09 57.90 100.20 2.63 36.83 300.90
5 2000-12-10 58.48 101.00 69.56 37.30 304.18
6 2000-12-11 59.07 101.81 70.46 37.78 307.50
7 2000-12-12 4.50 102.62 71.37 38.27 310.85
8 2000-12-13 4.54 44.50 72.29 38.77 314.24
9 2000-12-14 4.57 45.39 73.23 39.27 317.66
10 2000-12-15 4.61 46.30 74.18 39.78 321.12
Desired Output = df3
Date 1 2 3 4 5
0 2000-12-05 1.0000 1.0000 1.0000 1.0000 1.0000
1 2000-12-06 1.0200 1.0100 1.0340 1.0129 1.0003
2 2000-12-07 1.0200 1.0201 1.0692 1.0260 1.0007
3 2000-12-08 1.0302 1.0303 1.1055 1.0393 1.0010
4 2000-12-09 1.0405 1.0303 1.1431 1.0528 1.0010
5 2000-12-10 1.0509 1.0385 1.1431 1.0664 1.0119
6 2000-12-11 1.0614 1.0469 1.1579 1.0802 1.0230
7 2000-12-12 1.0614 1.0552 1.1729 1.0941 1.0341
8 2000-12-13 1.0699 1.0552 1.1880 1.1083 1.0454
9 2000-12-14 1.0785 1.0763 1.2034 1.1226 1.0568
10 2000-12-15 1.0871 1.0979 1.2190 1.1371 1.0683
Below show the formulas for the values in df3 for column 1
df3.row[0] = 1
df3.row[1] = (2.35/2.30) * 1 = 1.0200
df3.row[2] = (56.76/56.76) * 1.0200 = 1.0200
df3.row[3] = (57.33/56.76) * 1.0200 = 1.0302
df3.row[4] = (57.90/57.33) * 1.0302 = 1.0405
df3.row[5] = (58.48/57.90) * 1.0405 = 1.0509
df3.row[6] = (59.07/58.48) * 1.0509 = 1.0614
df3.row[7] = (4.50/4.50) * 1.0614 = 1.0614
df3.row[8] = (4.54/4.50) * 1.0614 = 1.0699
df3.row[9] = (4.57/4.54) * 1.0699 = 1.0785
df3.row[10] = (4.61/4.57) * 1.0785 = 1.0871
Below is what I have so far. Not confident that this is the best approach.
StartFromDay = 1
NumOfHoldings = 10
df3 = pd.DataFrame(columns = np.arange(1,NumOfHoldings+1))
df3.index.names = ['Date']
for col in df1.columns:
#First row should equal 1
df3.iloc[0][col] == 1
for i in range(StartFromDay, len(df1)):
#first row of each column
prevrow = df1.iloc[0][col]
if df1.iloc[i][col] == prevrow:
###### If Statements to calculate compound return#######
Upvotes: 0
Views: 730
Reputation: 249462
Loops are slow, so we'll do it in a vectorized way. First, set the indexes appropriately:
df1.set_index('Date', inplace=True)
df2.set_index('Date', inplace=True)
Next, generate a boolean mask which is True wherever the symbol is the same:
same_stock = df1.iloc[1:].values == df1.iloc[:-1].values
We have to use values
because the shifted series are not aligned on the index anymore.
And make a matrix with all the df2.row[1]/df2.row[0]
values:
ret = df2.iloc[1:].values / df2.iloc[:-1].values
Next, replace the returns where the symbol changed:
ret[~same_stock] = 1 # pretend return is flat when symbol changed
Now create a DataFrame with the result:
simpret = pd.DataFrame(np.vstack(([1,1,1,1,1], ret)), df1.index)
df3 = simpret.cumprod()
Upvotes: 1