h.l.m
h.l.m

Reputation: 13475

R translation to Python

I have some code that I wrote in R that I would like to have translated into Python, but am new to python so need a bit of help

The R code basically simulates 250 random normals, and then calculated a geometric mean return of sorts and then a max drawdown, it does this 10000 times and then combines the results, as shown below.

mu <- 0.06
sigma <- 0.20
days <- 250
n <- 10000
v <- do.call(rbind,lapply(seq(n),function(y){
  rtns <- rnorm(days,mu/days,sqrt(1/days)*sigma)
  p.rtns <- cumprod(rtns+1)
  p.rtns.md <- min((p.rtns/cummax(c(1,p.rtns))[-1])-1)
  tot.rtn <- p.rtns[days]-1
  c(tot.rtn,p.rtns.md)
}))

This is my attempt in Python, (if you can make it shorter/more eloquent/more efficient please suggest as answer)

import numpy as np
import pandas as pd
mu = float(0.06)
sigma = float(0.2)
days = float(250)
n = 10000
rtns = np.random.normal(loc=mu/days,scale=(((1/days)**0.5)*sigma),size=days)
rtns1 = rtns+1
prtns = rtns1.cumprod()
totrtn = prtns[len(prtns)-1] -1
h = prtns.tolist()
h.insert(0,float(1))
hdf = pd.DataFrame(prtns)/(pd.DataFrame(h).cummax()[1:len(h)]-1))[1:len(h)]]

and that was as far as I got... wasn't too sure if hdf was correct to get p.rtns.md, and wasnt sure how I would go about simulating this 10000 times.

All suggestions would be greatly appreciated...

Upvotes: 6

Views: 2656

Answers (2)

CT Zhu
CT Zhu

Reputation: 54330

First, your last line of code:

hdf = pd.DataFrame(prtns)/(pd.DataFrame(h).cummax()[1:len(h)]-1))[1:len(h)]]

can't be right. May be this according to your R code:

hdf = (pd.DataFrame(prtns)/(pd.DataFrame(h).cummax()[1:len(h)])-1)[1:len(h)]

Second, c(1,p.rtns) can be replaced with np.hstack(1, prtns) instead of converting the np.array to list.

Third, looks like you are using pandas just for the cummax(). It is not hard to implement one, like this:

def cummax(a):
    ac=a.copy()
    if a.size>0:
        max_idx=np.argmax(a)
        ac[max_idx:]=np.max(ac)
        ac[:max_idx]=cummax(ac[:max_idx])
    else:
        pass
    return ac

And:

>>> a=np.random.randint(0,20,size=10)
>>> a
array([15, 15, 15,  8,  5, 14,  6, 18,  9,  1])
>>> cummax(a)
array([15, 15, 15, 15, 15, 15, 15, 18, 18, 18])

Take these all together we get:

def run_simulation(mu, sigma, days, n):
    result=[]
    for i in range(n):
        rtns = np.random.normal(loc=1.*mu/days,
                    scale=(((1./days)**0.5)*sigma),
                    size=days)
        p_rtns = (rtns+1).cumprod()
        tot_rtn = p_rtns[-1]-1 
        #looks like you want the last element, rather than the 2nd form the last as you did
        p_rtns_md =(p_rtns/cummax(np.hstack((0.,p_rtns)))[1:]-1).min() 
        #looks like you want to skip the first element, python is different from R for that.
        result.append((tot_rtn, p_rtns_md))
    return result

And:

>>> run_simulation(0.06, 0.2, 250,10)
[(0.096077511394818016, -0.16621830496112056), (0.73729333554192, -0.13566124517484235), (0.087761655465907973, -0.17862916081223446), (0.07434851091082928, -0.15972961033789046), (-0.094464694393288307, -0.2317397117033817), (-0.090720761054686627, -0.1454002204893271), (0.02221364097529932, -0.15606214341947877), (-0.12362835704696629, -0.24323096421682033), (0.023089144896788261, -0.16916790589553599), (0.39777037782177493, -0.10524624505023494)]

The loop is actually not necessary as we can work in two dimension by generating 2D array of Guassian random variable (change size=days to size=(days, n)). Avoiding the loop will most likely be faster. However, that will require a different cummax() function as this one show here is restricted to 1D. But cummax() in R is restricted to 1D as well (not exactly, if you pass a 2D to cummax(), it will be flattened). So to keep things simple and comparable between Python and R, let's settle for the loop version.

Upvotes: 2

SimonT
SimonT

Reputation: 2359

I'm unfamiliar with R, but I see some general improvements that could be made to your Python code:

  • Use 0.06 without float() around, since Python will infer that a numeric value with a decimal point is a float
    • The last line, h.insert(0,float(1)) can be replaced with h.insert(0,1.0)
  • You can reference the last item in an iterable using [-1], the second-last using [-2], etc.:
    • totrtn = prtns[-1] -1

Python developers usually choose underscores between words or camelcase. In addition, it is normally preferable to use the full words in your variable names for readability over economy on-screen. For example, some variables here could be renamed to returns and total_returns or totalReturns.

To run your simulation 10000 times, you should use a for loop:

for i in range(10000):
    # code to be repeated 10000 goes in an indented block here
    # more lines in the loop should be indented at same level as previous line
# to mark what code runs after the for loop finishes, just un-indent again
h - prtns.tolist()
...

Upvotes: 4

Related Questions