Reputation: 1682
I am new to python pandas. And i wanted to estimate a value of inflow payment/s over term considering fees and growth over a period. I only used one payment (inflow) to test. Sometimes fee2 can be applied up to period n-t. i.e. Not for the whole period, n.
I did it like below, just wondering if there is a better way to recalculate the values without looping?
Python code:
import pandas as pd
import numpy as np
def getCashFlows():
term = 2
growthRate = (1+0.06)**(1/12) - 1
df = pd.DataFrame(list(range(1,term*12+1)), columns=['t'])
df['Value_t_1'] = 0
df['Inflow1']=0
df['growth']=0
df['ValuePlusGrowth'] = 0
df['fee1']=0
df['fee2']=30
df['Value_t']=0
df.set_value(0, 'Inflow1', 10000)
for i in range(0,term*12):
df['Value_t_1'] = df['Value_t'].shift()
df['Value_t_1'].fillna(0,inplace=True)
df['growth'] = (df['Value_t_1'] + df['Inflow1'])*growthRate
df['ValuePlusGrowth'] = df['Value_t_1']+df['Inflow1']+df['growth']
df['fee1']=df['ValuePlusGrowth']*0.5/100
df['Value_t'] = df['ValuePlusGrowth'] - df['fee1'] - df['fee2']
return df
Upvotes: 3
Views: 2091
Reputation: 21274
The only initial input that's really needed is the initial value for inflow
. Everything else can be reduced to operations that are repeated for a set number of times, based on the row index. And some of the columns in your data frame are really just constants.
Here's a solution that clarifies the operations needed to compute each row of the data frame:
import pandas as pd
class GrowthTracker(object):
def __init__(self, n_iter):
self.colnames = ['Value_t_1', 'growth', 'ValuePlusGrowth', 'fee1', 'Value_t']
self.data = None
self.fee1_mult = 0.5/100
self.fee2 = (0,0,0,0,30)
self.growthRate = (1+0.06)**(1/12) - 1
self.n_iter = n_iter
self.ops = pd.Series([1, # Value_t_1
self.growthRate, # growth
(1 + self.growthRate), # ValuePlusGrowth
(1 + self.growthRate) * self.fee1_mult, # fee1
(1 + self.growthRate) * (1 - self.fee1_mult) # Value_t
])
def update(self, t, n, df=None):
row = self.ops.mul(t).subtract(self.fee2)
tmp = pd.concat([df, row], axis = 1, ignore_index=True)
if n < self.n_iter:
self.data = self.update(row.iloc[-1], n+1, tmp)
return self.data
else:
tmp.iloc[0,0] = 0 # remove the initial 10000 from Value_t_1
self.data = tmp.T
self.data.columns = self.colnames
return self.data
Now just set the initial values, instantiate the GrowthTracker
object, and update()
:
total_iter = 23
tracker = GrowthTracker(n_iter=total_iter)
inflow = 10000
start_index = 0
tracker.update(t=inflow, n=start_index)
tracker.data
Value_t_1 growth ValuePlusGrowth fee1 Value_t
0 0.000000 48.675506 10048.675506 50.243378 9968.432128
1 9968.432128 48.521847 10016.953976 50.084770 9936.869206
2 9936.869206 48.368213 9985.237419 49.926187 9905.311232
3 9905.311232 48.214603 9953.525835 49.767629 9873.758206
4 9873.758206 48.061017 9921.819223 49.609096 9842.210127
5 9842.210127 47.907455 9890.117583 49.450588 9810.666995
6 9810.666995 47.753918 9858.420912 49.292105 9779.128808
7 9779.128808 47.600404 9826.729212 49.133646 9747.595566
8 9747.595566 47.446914 9795.042480 48.975212 9716.067268
9 9716.067268 47.293449 9763.360716 48.816804 9684.543913
10 9684.543913 47.140007 9731.683920 48.658420 9653.025500
11 9653.025500 46.986590 9700.012090 48.500060 9621.512030
12 9621.512030 46.833196 9668.345226 48.341726 9590.003500
13 9590.003500 46.679827 9636.683327 48.183417 9558.499910
14 9558.499910 46.526482 9605.026392 48.025132 9527.001260
15 9527.001260 46.373160 9573.374420 47.866872 9495.507548
16 9495.507548 46.219863 9541.727411 47.708637 9464.018774
17 9464.018774 46.066590 9510.085364 47.550427 9432.534937
18 9432.534937 45.913341 9478.448278 47.392241 9401.056037
19 9401.056037 45.760116 9446.816152 47.234081 9369.582072
20 9369.582072 45.606915 9415.188986 47.075945 9338.113041
21 9338.113041 45.453737 9383.566779 46.917834 9306.648945
22 9306.648945 45.300584 9351.949529 46.759748 9275.189781
23 9275.189781 45.147455 9320.337237 46.601686 9243.735551
I found it easier to express this all as a class, but it's simple enough to just define variables outside of the class and then just run the update()
function.
UPDATE
Here's a bit more explanation behind this solution:
The initial data frame df
is mostly empty. The only fully non-zero columns are t
, which is never used, and fee2
, which is a constant (fee2 = 30
). The entire remainder of df
starts out as zero-valued, with the one exception of the first cell in Inflow1
- its first value is 10000
, and the rest of its values are zeros.
That means that, in terms of what we need to accomplish computationally, we can limit our "matrix of interest" to the columns Value_t_1
, growth
, ValuePlusGrowth
, fee1
, and Value_t
.
We can think of that first Inflow1
value as the seed - everything else is just a string of operations performed on the number 10000
. (In fact, we don't actually need Inflow1
as a field, as all of its other values remain zero throughout the calculations.)
In your loop, you initially updated columns using the values of other columns. That makes sense, it's probably how I would have done it too - looks neat and efficient. Recall, however, that every update is really just a string of math tracing its lineage back to that original 10000
. Writing out the actual operations for each column update, instead of using other column names, shows how each update operation can be simplified.
First, a few shorthand notations:
t = Value_t from previous row (in case of the first row, Value_t = Inflow1 = 10000)
t1 = Value_t_1
g = growth
inf = Inflow1
vpg = ValuePlusGrowth
gr = growthRate # gr is a constant: (1+0.06)**(1/12) - 1
f1X = 0.5/100
new_t = Value_t for current row
We start with t = 10000
. Everything else is some operation on t
.
Every value can be expressed in terms of what we need to multiply t
by in order to get the desired value (with one exception that I'll get to later). So for example:
df['Value_t_1'] = df['Value_t'].shift()
df['Value_t_1'].fillna(0,inplace=True)
# equivalent to:
t1 = 1 * t # recall t is the shifted Value_t from the previous row
Keep in mind that we only need to drop in the seed value t
once, and then it's just operations on the seed to populate all of df
. That means that the operations in your loop can be expressed as "the terms that need to be multiplied by t in order to get the correct column value". So although we have shown that t1 = 1 * t
, it's more useful for us to think of t1 = 1
- eventually we'll multiply that by t
, but the right side of that equation represents t1
's relationship to t
.
Then:
t1 = 1
Next:
# Inflow1 is always 0, except for its initial value which we capture in initial t, so:
df['growth'] = (df['Value_t_1'] + df['Inflow1'])*growthRate
# becomes:
g = t1 * gr
# with t1 = 1
g = gr
# we know t1 = 1, and inf is never used as a multiplier, so:
df['ValuePlusGrowth'] = df['Value_t_1']+df['Inflow1']+df['growth']
# becomes:
vpg = 1 + g = 1 + gr
df['fee1']=df['ValuePlusGrowth']*0.5/100
# becomes:
fee1 = vpg * f1X = (1 + gr) * f1X
# we'll ignore subtracting fee2 for now, see notes below.
df['Value_t'] = df['ValuePlusGrowth'] - df['fee1'] - df['fee2']
# becomes:
new_t = vpg - fee1 = (1 + gr) - ((1 + gr) * f1X) = (1 + gr) * (1 - f1X)
ops = (t1, g, vpg, fee1, new_t)
Now, for each row, we have a set of update operations ops
for each column. Given that we have t
from the previous row, we can populate values for each row with:
new_row = t * ops
We still need to subtract fee2
from new_t
, and that doesn't fit neatly into what, up until this point, had been a series of multiplicative operations. But we can stick with our vectorized formulation and define:
fee2 = (0,0,0,0,30)
After each new_row
we subtract the fee2
vector from the new_row
vector, which really just subtracts fee2
from new_t
as desired.
new_row = t * ops - fee2
At this point, we just need a function that starts with t = 10000
and keeps executing the new_row
formula, building on each previous row, until we've reached our desired number of iterations. I chose a recursive strategy to do this, and saved each new_row
to a data frame at every recursive step.
Finally, since I somewhat abused your original notation by setting t = 10000
instead of Inflow1 = 10000
, this means the first t1
value is incorrectly set at 10000
. At the end of the update()
function we set this first t1
value back to 0
.
Upvotes: 2