Reputation: 13475
I have some code that I wrote in R that I would like to have translated into Python, but am new to python so need a bit of help
The R code basically simulates 250 random normals, and then calculated a geometric mean return of sorts and then a max drawdown, it does this 10000 times and then combines the results, as shown below.
mu <- 0.06
sigma <- 0.20
days <- 250
n <- 10000
v <- do.call(rbind,lapply(seq(n),function(y){
rtns <- rnorm(days,mu/days,sqrt(1/days)*sigma)
p.rtns <- cumprod(rtns+1)
p.rtns.md <- min((p.rtns/cummax(c(1,p.rtns))[-1])-1)
tot.rtn <- p.rtns[days]-1
c(tot.rtn,p.rtns.md)
}))
This is my attempt in Python, (if you can make it shorter/more eloquent/more efficient please suggest as answer)
import numpy as np
import pandas as pd
mu = float(0.06)
sigma = float(0.2)
days = float(250)
n = 10000
rtns = np.random.normal(loc=mu/days,scale=(((1/days)**0.5)*sigma),size=days)
rtns1 = rtns+1
prtns = rtns1.cumprod()
totrtn = prtns[len(prtns)-1] -1
h = prtns.tolist()
h.insert(0,float(1))
hdf = pd.DataFrame(prtns)/(pd.DataFrame(h).cummax()[1:len(h)]-1))[1:len(h)]]
and that was as far as I got... wasn't too sure if hdf
was correct to get p.rtns.md
, and wasnt sure how I would go about simulating this 10000 times.
All suggestions would be greatly appreciated...
Upvotes: 6
Views: 2656
Reputation: 54330
First, your last line of code:
hdf = pd.DataFrame(prtns)/(pd.DataFrame(h).cummax()[1:len(h)]-1))[1:len(h)]]
can't be right. May be this according to your R
code:
hdf = (pd.DataFrame(prtns)/(pd.DataFrame(h).cummax()[1:len(h)])-1)[1:len(h)]
Second, c(1,p.rtns)
can be replaced with np.hstack(1, prtns)
instead of converting the np.array
to list
.
Third, looks like you are using pandas
just for the cummax()
. It is not hard to implement one, like this:
def cummax(a):
ac=a.copy()
if a.size>0:
max_idx=np.argmax(a)
ac[max_idx:]=np.max(ac)
ac[:max_idx]=cummax(ac[:max_idx])
else:
pass
return ac
And:
>>> a=np.random.randint(0,20,size=10)
>>> a
array([15, 15, 15, 8, 5, 14, 6, 18, 9, 1])
>>> cummax(a)
array([15, 15, 15, 15, 15, 15, 15, 18, 18, 18])
Take these all together we get:
def run_simulation(mu, sigma, days, n):
result=[]
for i in range(n):
rtns = np.random.normal(loc=1.*mu/days,
scale=(((1./days)**0.5)*sigma),
size=days)
p_rtns = (rtns+1).cumprod()
tot_rtn = p_rtns[-1]-1
#looks like you want the last element, rather than the 2nd form the last as you did
p_rtns_md =(p_rtns/cummax(np.hstack((0.,p_rtns)))[1:]-1).min()
#looks like you want to skip the first element, python is different from R for that.
result.append((tot_rtn, p_rtns_md))
return result
And:
>>> run_simulation(0.06, 0.2, 250,10)
[(0.096077511394818016, -0.16621830496112056), (0.73729333554192, -0.13566124517484235), (0.087761655465907973, -0.17862916081223446), (0.07434851091082928, -0.15972961033789046), (-0.094464694393288307, -0.2317397117033817), (-0.090720761054686627, -0.1454002204893271), (0.02221364097529932, -0.15606214341947877), (-0.12362835704696629, -0.24323096421682033), (0.023089144896788261, -0.16916790589553599), (0.39777037782177493, -0.10524624505023494)]
The loop is actually not necessary as we can work in two dimension by generating 2D array
of Guassian random variable (change size=days
to size=(days, n)
). Avoiding the loop will most likely be faster. However, that will require a different cummax()
function as this one show here is restricted to 1D. But cummax()
in R
is restricted to 1D as well (not exactly, if you pass a 2D to cummax()
, it will be flattened). So to keep things simple and comparable between Python
and R
, let's settle for the loop version.
Upvotes: 2
Reputation: 2359
I'm unfamiliar with R, but I see some general improvements that could be made to your Python code:
0.06
without float()
around, since Python will infer that a numeric value with a decimal point is a float
h.insert(0,float(1))
can be replaced with h.insert(0,1.0)
[-1]
, the second-last using [-2]
, etc.:
totrtn = prtns[-1] -1
Python developers usually choose underscores between words or camelcase. In addition, it is normally preferable to use the full words in your variable names for readability over economy on-screen. For example, some variables here could be renamed to returns
and total_returns
or totalReturns
.
To run your simulation 10000 times, you should use a for
loop:
for i in range(10000):
# code to be repeated 10000 goes in an indented block here
# more lines in the loop should be indented at same level as previous line
# to mark what code runs after the for loop finishes, just un-indent again
h - prtns.tolist()
...
Upvotes: 4