Tendero
Tendero

Reputation: 1166

Why is this script so slow in Python?

I have written a script that trains a 1D Kohonen network in MATLAB and it works as a charm. I then tried to translate it to Python 2.7, a language in which I'm pretty new, and the script is taking forever to run.

I'll explain what I'm doing to see if someone here can throw some light on the matter. I have a given dataset in the matrix y and I want to train different SOMs with it. The SOM is one-dimensional (a line), and its number of neurons varies. I train a SOM of size N=2 at first, and N=NMax at last, giving a total of NMax-2+1 SOMs. For each SOM, I want to store the weights once the training is over before moving on to the next SOM.

In MATLAB, for NMax = 5 and iterMax = 50, it takes 9.74 seconds. In Python, 54.04 seconds. This difference is huge, and the actual datasets, number of SOMs and number of iterations is even greater, so the Python code takes forever to end.

My current code is the following:

import numpy as np
import time
y = np.random.rand(2500,3) # Create random dataset to test
def A(d,s): # Neighborhood function
    return np.exp(-d**2 / (2*s**2))
sigma_0 = float(5) # Initial standard deviation for A
eta_0 = float(1) # Initial learning rate
iterMax = 250 # Maximum number of iterations
NMax = 10 # Maximum number of neurons
w = range(NMax - 1) # Initialize the size of the weight matrix (it will store NMax-2+1 sets of weights, each of varying size depending on the value of N)
#%% KOHONEN 1D
t = time.time() # Start time
for N in np.arange(2,NMax + 1): # Size of the network
    w[N - 2] = np.random.uniform(0,1,(N,np.size(y,axis=1))) - 0.5 # Initial weights
    iterCount = 1; # Iteration counter
    while iterCount < iterMax:
        # Mix the datapoints to choose them in random order
        mixInputs = y[np.random.permutation(np.size(y,axis = 0)),:]
        # Decrease the value of the variance and the learning rate
        sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount
        eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount
        for kk in range(np.size(mixInputs,axis = 0)): # Picking one datapoint at a time
            selectedInput = mixInputs[kk,:]
            # These two lines calculate the weight that is the nearest to the datapoint selected
            aux = np.absolute(np.array(np.kron(np.ones((N,1)),selectedInput)) - np.array(w[N - 2]))
            aux = np.sum(np.abs(aux)**2,axis=-1)
            ii = np.argmin(aux) # The node ii is the winner
            for jj in range(N):
                dist = min(np.absolute(ii-jj) , np.absolute(np.absolute(ii-jj)-N)) # Centering the neighborhood function in the winner
                w[N - 2][jj,:] = w[N - 2][jj,:] + eta * A(dist,sigma) * (selectedInput - w[N - 2][jj,:]) # Updating the weights
        print(N,iterCount)
        iterCount = iterCount + 1 

elapsedTime = time.time() - t

In MATLAB, each iteration (each time the variable iterCount increases by 1) would be almost instant. In Python, each one takes a long time. I don't know why they are performing so different, but I would like to see if it is possible to speed up the Python version. Any suggestions?

EDIT: As requested in the comments, here is the much-faster MATLAB version of the code.

y = rand(2500,3) % Random dataset
A = @(d,s) exp(-d^2 / (2*s^2));
sigma_0 = 5;
eta_0 = 1;
iterMax = 250;
NMax = 10;
w = cell(NMax - 1,1);

%% KOHONEN 1D
tic();
for N = 2 : NMax 
    w{N - 1} = rand(N,size(y,2)) - 0.5;
    iterCount = 1; 
    while (iterCount < iterMax)
        mixInputs = y(randperm(size(y,1)),:);
        sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount;
        eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount;
        for kk = 1 : size(mixInputs,1)
            input = mixInputs(kk,:);
            % [~,ii] = min(pdist2(input,w{N - 1}));
            aux = abs(repmat(input,N,1) - w{N - 1});
            [~,ii] = min((sum(aux.^2,2))); 
            for jj = 1 : N
                dist = min(abs(ii-jj) , abs(abs(ii-jj)-N));
                w{N - 1}(jj,:) = w{N - 1}(jj,:) + eta * A(dist,sigma) * (input - w{N - 1}(jj,:));
            end
        end
        N % Show N
        iterCount = iterCount + 1 % Show iterCount
    end
end
toc();

Upvotes: 2

Views: 191

Answers (2)

jeremycg
jeremycg

Reputation: 24945

Here's a quick stab at some speedups - I think the output is the same, but for sure take some time to double check it:

import numpy as np
import time

np.random.seed(1234)
y = np.random.rand(2500,3) # Create random dataset to test

sigma_0 = float(5) # Initial standard deviation for A
eta_0 = float(1) # Initial learning rate
iterMax = 10 # Maximum number of iterations
NMax = 10 # Maximum number of neurons
w = {} # Initialize the size of the weight matrix (it will store NMax-2+1 sets of weights, each of varying size depending on the value of N)
#%% KOHONEN 1D
t = time.time() # Start time
for N in np.arange(2,NMax + 1): # Size of the network
    w[N - 2] = np.random.uniform(0,1,(N,np.size(y,axis=1))) - 0.5 # Initial weights
    iterCount = 1; # Iteration counter
    while iterCount < iterMax:
        # Mix the datapoints to choose them in random order
        mixInputs = y[np.random.permutation(np.size(y,axis = 0)),:]
        # Decrease the value of the variance and the learning rate
        sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount
        s2 = 2*sigma**2
        eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount
        for kk in range(np.size(mixInputs,axis = 0)): # Picking one datapoint at a time
            selectedInput = mixInputs[kk,:]
            # These two lines calculate the weight that is the nearest to the datapoint selected
            aux = np.sum((selectedInput - np.array(w[N - 2]))**2, axis = -1)
            ii = np.argmin(aux)
            jjs = np.abs(ii - np.arange(N))
            dists = np.min(np.vstack([jjs , np.absolute(jjs-N)]), axis = 0)
            w[N - 2] = w[N - 2] + eta * np.exp((-dists**2)/s2).T[:,np.newaxis] * (selectedInput - w[N - 2]) # Updating the weights

        print(N,iterCount)
        iterCount = iterCount + 1 

elapsedTime = time.time() - t

Key speedups are mostly in using broadcasting to reduce loops/function calls.

we can replace the line:

 aux = np.absolute(np.array(np.kron(np.ones((N,1)),selectedInput)) - np.array(w[N - 2]))

with:

aux = np.abs(selectedInput - np.array(w[N - 2]))

(I have further crammed this in to a couple of the next steps). Numpy broadcasting gives us the same result, without have to take the kron product.

As an example:

np.kron(np.ones((3,1)), np.array([6,5,4])) - np.arange(-9,0).reshape(3,3)

is the same output as:

np.array([6,5,4]) - np.arange(-9,0).reshape(3,3)

kron(np.ones(N,1),x) gives an N * x.shape[0] array with N copies of x. Broadcasting takes care of this in a much cheaper manner.

The other main speed up is to reduce the:

for jj in range(N):

To a matrix operation. We calculate the 2*sigma**2 once per loop, replace the A function with a native numpy call, and vectorize the rest.

Upvotes: 1

Attie
Attie

Reputation: 6979

Use the profiling module to find out what functions are being called, and how long they are taking.

In the output below, the columns have the following meanings:

ncalls for the number of calls,

tottime for the total time spent in the given function (and excluding time made in calls to sub-functions)

percall is the quotient of tottime divided by ncalls

cumtime is the cumulative time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.

percall is the quotient of cumtime divided by primitive calls

filename:lineno(function) provides the respective data of each function


It looks like you're calling A() many, many times... often with the same value.

Output of python2.7 -m cProfile -s tottime ${YOUR_SCRIPT}

         5481855 function calls (5481734 primitive calls) in 4.986 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    1.572    1.572    4.986    4.986 x.py:1(<module>)
   214500    0.533    0.000    0.533    0.000 x.py:8(A)
   107251    0.462    0.000    1.986    0.000 shape_base.py:686(kron)
   107251    0.345    0.000    0.456    0.000 numeric.py:1015(outer)
   214502    0.266    0.000    0.563    0.000 {sorted}
...

Try caching the values:

A_vals = {}

def A(d,s): # Neighborhood function
    t = (d,s)
    if t in A_vals:
        return A_vals[t]
    ret = np.exp(-d**2 / (2*s**2))
    A_vals[t] = ret
    return ret

Now we see:

         6206113 function calls (6205992 primitive calls) in 4.986 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    1.727    1.727    4.986    4.986 x.py:1(<module>)
   121451    0.491    0.000    2.180    0.000 shape_base.py:686(kron)
   121451    0.371    0.000    0.496    0.000 numeric.py:1015(outer)
   242902    0.293    0.000    0.621    0.000 {sorted}
   121451    0.265    0.000    0.265    0.000 {method 'reduce' of 'numpy.ufunc' objects}
...
   242900    0.091    0.000    0.091    0.000 x.py:7(A)
...

At this point it becomes a simple optimisation exercise!...

Next on your list is kron() - enjoy!


You'll also find it helpful (from a style point of view and profiling) to break your script into smaller functions. I've done the following purely for profiling reasons - so you would be better of using sensible names, and potentially making better segmentation.

import numpy as np
import time

y = np.random.rand(2500,3) # Create random dataset to test

A_vals = {}
def A(d,s): # Neighborhood function
    t = (d,s)
    if t in A_vals:
        return A_vals[t]
    ret = np.exp(-d**2 / (2*s**2))
    A_vals[t] = ret
    return ret

def a():
    sigma_0 = float(5) # Initial standard deviation for A
    eta_0 = float(1) # Initial learning rate
    iterMax = 250 # Maximum number of iterations
    NMax = 10 # Maximum number of neurons
    w = range(NMax - 1) # Initialize the size of the weight matrix (it will store NMax-2+1 sets of weights, each of varying size depending on the value of N)
    #%% KOHONEN 1D
    t = time.time() # Start time
    for N in np.arange(2,NMax + 1): # Size of the network
        b(w, N, sigma_0, eta_0, iterMax)

    elapsedTime = time.time() - t

def b(w, N, sigma_0, eta_0, iterMax):
    w[N - 2] = np.random.uniform(0,1,(N,np.size(y,axis=1))) - 0.5 # Initial weights
    for iterCount in range(1, iterMax):
        c(N, sigma_0, eta_0, iterMax, iterCount, w)

def c(N, sigma_0, eta_0, iterMax, iterCount, w):
    # Mix the datapoints to choose them in random order
    mixInputs = y[np.random.permutation(np.size(y,axis = 0)),:]
    # Decrease the value of the variance and the learning rate
    sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount
    eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount
    for kk in range(np.size(mixInputs,axis = 0)): # Picking one datapoint at a time
        d(N, w, mixInputs, sigma, eta, kk)
    print(N,iterCount)

def d(N, w, mixInputs, sigma, eta, kk):
    selectedInput = mixInputs[kk,:]
    # These two lines calculate the weight that is the nearest to the datapoint selected
    aux = np.absolute(np.array(np.kron(np.ones((N,1)),selectedInput)) - np.array(w[N - 2]))
    aux = np.sum(np.abs(aux)**2,axis=-1)
    ii = np.argmin(aux) # The node ii is the winner
    for jj in range(N):
        e(N, w, sigma, eta, selectedInput, ii, jj)

def e(N, w, sigma, eta, selectedInput, ii, jj):
    dist = min(np.absolute(ii-jj) , np.absolute(np.absolute(ii-jj)-N)) # Centering the neighborhood function in the winner
    f(N, w, sigma, eta, selectedInput, jj, dist)

def f(N, w, sigma, eta, selectedInput, jj, dist):
    w[N - 2][jj,:] = w[N - 2][jj,:] + eta * A(dist,sigma) * (selectedInput - w[N - 2][jj,:]) # Updating the weights

a()
         6701974 function calls (6701853 primitive calls) in 4.985 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   238921    0.691    0.000    0.777    0.000 x.py:56(f)
   119461    0.613    0.000    4.923    0.000 x.py:43(d)
   119461    0.498    0.000    2.144    0.000 shape_base.py:686(kron)
   238921    0.462    0.000    1.280    0.000 x.py:52(e)
   119461    0.369    0.000    0.495    0.000 numeric.py:1015(outer)

This identifies f() as the biggest time.

Upvotes: 2

Related Questions