Vectorizing a Multi-Dimensional Function in Python

Question

I have been a frequent lurker on Stack Overflow for some time and I tend to find very useful and clear information from here whenever I have coding questions. However, I can't really seem to find a thread that addresses my specific inquiry today.

Earlier today, I learned about vectorizing functions in Python in order to speed up computing time. I am currently trying to optimize a python program that I had written a little over a month ago. My program takes a text file containing data in the following format:

I then assign each column to lists mag, dmag, and expnum.

What I want to do is create a 2d array of the mag and dmag values that share the same expnum (having the same exposure number means that the mag and dmag point to the same data point).

I do this for all exposure numbers and, at the end, I take the median of the mag and dmag, and the standard deviation of the mag for each of the exposure number-based arrays and combine them all into one array that I can plot.

Currently, I have the following code:

from numpy import loadtxt,array,asarray,append,std,median,empty,take

data = loadtxt(infile,usecols=(0,1,2))
mag = data1[:,2].tolist() 
dmag = data1[:,3].tolist() 
expnum = data1[:,4].tolist() 

#initialize variables
indexing = list() 
master_mag = list() 
master_dmag = list() 
sub_mag = list() 
sub_dmag = list() 

mag_std = array([]) 
mag_stdmed = array([]) 
mag_med = array([])  

while len(mag) > 0: 
    num=expnum[0] 
    for i in range(0,len(expnum)): 
        if expnum[i] == num: 
            sub_mag.append(mag[i]) 
            sub_dmag.append(dmag[i]) 
            indexing.append(i) 

    #add the sub lists to their master lists
    master_mag.append(sub_mag) 
    master_dmag.append(sub_dmag) 
    sub_mag=list() 
    sub_dmag=list()

    #remove from mag, dmag, and expnum the index referred to by indexing
    while len(indexing) > 0:    
        mag.pop(indexing[-1]) 
        dmag.pop(indexing[-1]) 
        expnum.pop(indexing[-1]) 
        indexing.pop() 

#make the master mag and dmag lists into numpy arrays 
master_mag=asarray(master_mag) 
master_dmag=asarray(master_dmag) 

#generate the mag and dmag median and mag std arrays 
for i in range(0,len(master_mag)): 
    mag_std=append(mag_std,std(master_mag[i])) 
    mag_med=append(mag_med,median(master_mag[i])) 
    mag_stdmed=append(mag_stdmed,median(master_dmag[i])) 

#create empty numpy arrays to be used for mag med vs. mag std 
#and mag med vs. dmag med 
med_std=empty([0,2]) 
med_dmed=empty([0,2]) 

#fill in those arrays 
for i in range(0,len(mag_std)): 
    med_std=append(med_std,[[mag_med[i],mag_std[i]]],axis=0) 
    med_dmed=append(med_dmed,[[mag_med[i],mag_stdmed[i]]],axis=0) 

#sort the median mag and dmag standard deviation arrays by median mag 
order_med_std=med_std[:,0].argsort() 
order_med_dmed=med_dmed[:,0].argsort() 

sorted_med_std=take(med_std,order_med_std,0) 
sorted_med_dmed=take(med_dmed,order_med_dmed,0)

And then I'm ready to plot sorted_med_dmed[:,0] vs. sorted_med_dmed[:,1] and sorted_med_std[:,0] vs. sorted_med_std[:,1]

This code works, it's just that I feel that it is too slow (especially when I get over 10,000 data points to work with). I want to try and vectorize this code to make it much quicker, but I have no idea where to begin.

I would like some help figuring out how to vectorize the matching-by-exposure-number component. I was thinking of creating a multi-dimensional array at the start that has the format: array([[[mag],[dmag]],...]) and a length equal to the number of different exposure numbers. Is there a way to generate and update an array like this in-line, without having to use a ton of loops?

Please let me know if you need any further clarity on what exactly this code is doing.

Thank you for your time.

Vectorizing a Multi-Dimensional Function in Python

Answers (1)

Related Questions