Invincible
Invincible

Reputation: 422

Python: how to optimize this code

I tried to optimize the code below but I cannot figure out how to improve computation speed. Below code is taking almost 30 secs to run. this is taking time because of bootsam and filedata matrix. Can someone please help me to optimize this code Is it possible to improve the performance?

import numpy as np
filedata=np.genfromtxt('monthlydata1970to2010.txt',dtype='str') # this will creae 980 * 7 matrix
nboot=5000  
results=np.zeros((11,nboot));   #this will create 11*5000 matrix  
results[0,:]=600  
horizon=360  
balance=200  
bootsam=np.random.randint(984, size=(984, nboot)) # this will create 984*5000 matrix
for bs in range(0,nboot):  
   for mn in range(1,horizon+1):  
        if mn%12 ==1:  
            bondbal = 24*balance  
            sp500bal=34*balance  
            russbal = 44*balance  
            eafebal=55*balance  
            cashbal =66*balance  
            bondbal=bondbal*(1+float(filedata[bootsam[mn-1,bs]-1,2]))  
            sp500bal=sp500bal*(1+float(filedata[bootsam[mn-1,bs]-1,3]))  
            russbal=russbal*(1+float(filedata[bootsam[mn-1,bs]-1,4]))  
            eafebal=eafebal*(1+float(filedata[bootsam[mn-1,bs]-1,5]))  
            cashbal=cashbal*(1+float(filedata[bootsam[mn-1,bs]-1,6]))  
            balance=bondbal + sp500bal + russbal + eafebal + cashbal  
        else:  
            bondbal=bondbal*(1+float(filedata[bootsam[mn-1,bs]-1,2]))
            sp500bal=sp500bal*(1+float(filedata[bootsam[mn-1,bs]-1,3]))
            russbal=russbal*(1+float(filedata[bootsam[mn-1,bs]-1,4]))
            eafebal=eafebal*(1+float(filedata[bootsam[mn-1,bs]-1,5]))
            cashbal=cashbal*(1+float(filedata[bootsam[mn-1,bs]-1,6]))
            balance=bondbal + sp500bal + russbal + eafebal + cashbal
            if mn == 60:
               results[1,bs]=balance
            if mn == 120: 
               results[2,bs]=balance
            if mn == 180:
               results[3,bs]=balance
            if mn == 240:
               results[4,bs]=balance
            if mn == 300: 
               results[5,bs]=balance  

Upvotes: 0

Views: 275

Answers (2)

Vicent
Vicent

Reputation: 5452

It is difficult to answer without seeing the real code. I can't get your sample working because balance is set to inf early in the code, as it has been noticed in the comments to the question. Anyway a pretty obvious optimization is not to read the bootsam[mn-1,bs] element five times at every iteration in order to compute the xxbal variables. All those variables use the same bootsam element so you should read the element once and reuse it:

for bs in xrange(0,nboot):
   for mn in xrange(1,horizon+1):
        row = bootsam[mn-1,bs]-1
        if (mn % 12) == 1:  
            bondbal = 24*balance
            sp500bal=34*balance
            russbal = 44*balance
            eafebal=55*balance
            cashbal =66*balance

            bondbal=bondbal*(1+float(filedata[row,2]))  
            sp500bal=sp500bal*(1+float(filedata[row,3]))  
            russbal=russbal*(1+float(filedata[row,4]))  
            eafebal=eafebal*(1+float(filedata[row,5]))  
            cashbal=cashbal*(1+float(filedata[row,6]))  
            balance=bondbal + sp500bal + russbal + eafebal + cashbal
        else:  
            bondbal=bondbal*(1+float(filedata[row,2]))  
            sp500bal=sp500bal*(1+float(filedata[row,3]))  
            russbal=russbal*(1+float(filedata[row,4]))  
            eafebal=eafebal*(1+float(filedata[row,5]))  
            cashbal=cashbal*(1+float(filedata[row,6]))  

The optimized code (which uses a fake value for balance) runs nearly twice faster than the original one on my old Acer Aspire.

Update

If you need further optimizations you can do at least two more things:

  • do not add 1 and convert to float at every accessed element of filedata. Instead add 1 to the array at creation time and give it a float datatype.
  • do not use arithmetic expressions that mix numpy and built-in numbers because Python arithmetic works slower (you can read more on this problem in this SO thread)

The following code follows those advices:

filedata=np.genfromtxt('monthlydata1970to2010.txt',dtype='str') # this will creae 980 * 7 matrix
my_list = (np.float(1) + filedata.astype(np.float)).tolist() # np.float is converted to Python float
nboot=5000
results=np.zeros((11,nboot))   #this will create 11*5000 matrix
results[0,:]=600  
horizon=360
balance=200
bootsam=np.random.randint(5, size=(984, nboot)) # this will create 984*5000 matrix
for bs in xrange(0,nboot):
   for mn in xrange(1,horizon+1):
        row = int(bootsam[mn-1,bs]-1)
        if (mn % 12) == 1:
            bondbal = 24*balance
            sp500bal=34*balance
            russbal = 44*balance
            eafebal=55*balance
            cashbal =66*balance

            bondbal=bondbal*(my_list[row][2])  
            sp500bal=sp500bal*(my_list[row][3])  
            russbal=russbal*(my_list[row][4])  
            eafebal=eafebal*(my_list[row][5])  
            cashbal=cashbal*(my_list[row][6])  
            balance=bondbal + sp500bal + russbal + eafebal + cashbal
        else:  
            bondbal=bondbal*(my_list[row][2])  
            sp500bal=sp500bal*(my_list[row][3])  
            russbal=russbal*(my_list[row][4])  
            eafebal=eafebal*(my_list[row][5])  
            cashbal=cashbal*(my_list[row][6])  
            balance=bondbal + sp500bal + russbal + eafebal + cashbal  

With those changes the code runs nearly twice faster than the previously optimized code.

Upvotes: 2

Ron Klein
Ron Klein

Reputation: 9450

Basic Algebra: executing x = x * 1.23 360 times can be easily converted to a single execution of

x = x * (1.23 ** 360)

Refactor your code and you'll see that the loops are not really needed.

Upvotes: 5

Related Questions