Reputation: 1166
I have written a script that trains a 1D Kohonen network in MATLAB and it works as a charm. I then tried to translate it to Python 2.7, a language in which I'm pretty new, and the script is taking forever to run.
I'll explain what I'm doing to see if someone here can throw some light on the matter. I have a given dataset in the matrix y
and I want to train different SOMs with it. The SOM is one-dimensional (a line), and its number of neurons varies. I train a SOM of size N=2
at first, and N=NMax
at last, giving a total of NMax-2+1
SOMs. For each SOM, I want to store the weights once the training is over before moving on to the next SOM.
In MATLAB, for NMax = 5
and iterMax = 50
, it takes 9.74 seconds. In Python, 54.04 seconds. This difference is huge, and the actual datasets, number of SOMs and number of iterations is even greater, so the Python code takes forever to end.
My current code is the following:
import numpy as np
import time
y = np.random.rand(2500,3) # Create random dataset to test
def A(d,s): # Neighborhood function
return np.exp(-d**2 / (2*s**2))
sigma_0 = float(5) # Initial standard deviation for A
eta_0 = float(1) # Initial learning rate
iterMax = 250 # Maximum number of iterations
NMax = 10 # Maximum number of neurons
w = range(NMax - 1) # Initialize the size of the weight matrix (it will store NMax-2+1 sets of weights, each of varying size depending on the value of N)
#%% KOHONEN 1D
t = time.time() # Start time
for N in np.arange(2,NMax + 1): # Size of the network
w[N - 2] = np.random.uniform(0,1,(N,np.size(y,axis=1))) - 0.5 # Initial weights
iterCount = 1; # Iteration counter
while iterCount < iterMax:
# Mix the datapoints to choose them in random order
mixInputs = y[np.random.permutation(np.size(y,axis = 0)),:]
# Decrease the value of the variance and the learning rate
sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount
eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount
for kk in range(np.size(mixInputs,axis = 0)): # Picking one datapoint at a time
selectedInput = mixInputs[kk,:]
# These two lines calculate the weight that is the nearest to the datapoint selected
aux = np.absolute(np.array(np.kron(np.ones((N,1)),selectedInput)) - np.array(w[N - 2]))
aux = np.sum(np.abs(aux)**2,axis=-1)
ii = np.argmin(aux) # The node ii is the winner
for jj in range(N):
dist = min(np.absolute(ii-jj) , np.absolute(np.absolute(ii-jj)-N)) # Centering the neighborhood function in the winner
w[N - 2][jj,:] = w[N - 2][jj,:] + eta * A(dist,sigma) * (selectedInput - w[N - 2][jj,:]) # Updating the weights
print(N,iterCount)
iterCount = iterCount + 1
elapsedTime = time.time() - t
In MATLAB, each iteration (each time the variable iterCount
increases by 1) would be almost instant. In Python, each one takes a long time. I don't know why they are performing so different, but I would like to see if it is possible to speed up the Python version. Any suggestions?
EDIT: As requested in the comments, here is the much-faster MATLAB version of the code.
y = rand(2500,3) % Random dataset
A = @(d,s) exp(-d^2 / (2*s^2));
sigma_0 = 5;
eta_0 = 1;
iterMax = 250;
NMax = 10;
w = cell(NMax - 1,1);
%% KOHONEN 1D
tic();
for N = 2 : NMax
w{N - 1} = rand(N,size(y,2)) - 0.5;
iterCount = 1;
while (iterCount < iterMax)
mixInputs = y(randperm(size(y,1)),:);
sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount;
eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount;
for kk = 1 : size(mixInputs,1)
input = mixInputs(kk,:);
% [~,ii] = min(pdist2(input,w{N - 1}));
aux = abs(repmat(input,N,1) - w{N - 1});
[~,ii] = min((sum(aux.^2,2)));
for jj = 1 : N
dist = min(abs(ii-jj) , abs(abs(ii-jj)-N));
w{N - 1}(jj,:) = w{N - 1}(jj,:) + eta * A(dist,sigma) * (input - w{N - 1}(jj,:));
end
end
N % Show N
iterCount = iterCount + 1 % Show iterCount
end
end
toc();
Upvotes: 2
Views: 191
Reputation: 24945
Here's a quick stab at some speedups - I think the output is the same, but for sure take some time to double check it:
import numpy as np
import time
np.random.seed(1234)
y = np.random.rand(2500,3) # Create random dataset to test
sigma_0 = float(5) # Initial standard deviation for A
eta_0 = float(1) # Initial learning rate
iterMax = 10 # Maximum number of iterations
NMax = 10 # Maximum number of neurons
w = {} # Initialize the size of the weight matrix (it will store NMax-2+1 sets of weights, each of varying size depending on the value of N)
#%% KOHONEN 1D
t = time.time() # Start time
for N in np.arange(2,NMax + 1): # Size of the network
w[N - 2] = np.random.uniform(0,1,(N,np.size(y,axis=1))) - 0.5 # Initial weights
iterCount = 1; # Iteration counter
while iterCount < iterMax:
# Mix the datapoints to choose them in random order
mixInputs = y[np.random.permutation(np.size(y,axis = 0)),:]
# Decrease the value of the variance and the learning rate
sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount
s2 = 2*sigma**2
eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount
for kk in range(np.size(mixInputs,axis = 0)): # Picking one datapoint at a time
selectedInput = mixInputs[kk,:]
# These two lines calculate the weight that is the nearest to the datapoint selected
aux = np.sum((selectedInput - np.array(w[N - 2]))**2, axis = -1)
ii = np.argmin(aux)
jjs = np.abs(ii - np.arange(N))
dists = np.min(np.vstack([jjs , np.absolute(jjs-N)]), axis = 0)
w[N - 2] = w[N - 2] + eta * np.exp((-dists**2)/s2).T[:,np.newaxis] * (selectedInput - w[N - 2]) # Updating the weights
print(N,iterCount)
iterCount = iterCount + 1
elapsedTime = time.time() - t
Key speedups are mostly in using broadcasting to reduce loops/function calls.
we can replace the line:
aux = np.absolute(np.array(np.kron(np.ones((N,1)),selectedInput)) - np.array(w[N - 2]))
with:
aux = np.abs(selectedInput - np.array(w[N - 2]))
(I have further crammed this in to a couple of the next steps). Numpy broadcasting gives us the same result, without have to take the kron product.
As an example:
np.kron(np.ones((3,1)), np.array([6,5,4])) - np.arange(-9,0).reshape(3,3)
is the same output as:
np.array([6,5,4]) - np.arange(-9,0).reshape(3,3)
kron(np.ones(N,1),x)
gives an N * x.shape[0]
array with N copies of x.
Broadcasting takes care of this in a much cheaper manner.
The other main speed up is to reduce the:
for jj in range(N):
To a matrix operation. We calculate the 2*sigma**2
once per loop, replace the A function with a native numpy call, and vectorize the rest.
Upvotes: 1
Reputation: 6979
Use the profiling module to find out what functions are being called, and how long they are taking.
In the output below, the columns have the following meanings:
ncalls for the number of calls,
tottime for the total time spent in the given function (and excluding time made in calls to sub-functions)
percall is the quotient of tottime divided by ncalls
cumtime is the cumulative time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.
percall is the quotient of cumtime divided by primitive calls
filename:lineno(function) provides the respective data of each function
It looks like you're calling A()
many, many times... often with the same value.
Output of python2.7 -m cProfile -s tottime ${YOUR_SCRIPT}
5481855 function calls (5481734 primitive calls) in 4.986 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 1.572 1.572 4.986 4.986 x.py:1(<module>)
214500 0.533 0.000 0.533 0.000 x.py:8(A)
107251 0.462 0.000 1.986 0.000 shape_base.py:686(kron)
107251 0.345 0.000 0.456 0.000 numeric.py:1015(outer)
214502 0.266 0.000 0.563 0.000 {sorted}
...
Try caching the values:
A_vals = {}
def A(d,s): # Neighborhood function
t = (d,s)
if t in A_vals:
return A_vals[t]
ret = np.exp(-d**2 / (2*s**2))
A_vals[t] = ret
return ret
Now we see:
6206113 function calls (6205992 primitive calls) in 4.986 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 1.727 1.727 4.986 4.986 x.py:1(<module>)
121451 0.491 0.000 2.180 0.000 shape_base.py:686(kron)
121451 0.371 0.000 0.496 0.000 numeric.py:1015(outer)
242902 0.293 0.000 0.621 0.000 {sorted}
121451 0.265 0.000 0.265 0.000 {method 'reduce' of 'numpy.ufunc' objects}
...
242900 0.091 0.000 0.091 0.000 x.py:7(A)
...
At this point it becomes a simple optimisation exercise!...
Next on your list is kron()
- enjoy!
You'll also find it helpful (from a style point of view and profiling) to break your script into smaller functions. I've done the following purely for profiling reasons - so you would be better of using sensible names, and potentially making better segmentation.
import numpy as np
import time
y = np.random.rand(2500,3) # Create random dataset to test
A_vals = {}
def A(d,s): # Neighborhood function
t = (d,s)
if t in A_vals:
return A_vals[t]
ret = np.exp(-d**2 / (2*s**2))
A_vals[t] = ret
return ret
def a():
sigma_0 = float(5) # Initial standard deviation for A
eta_0 = float(1) # Initial learning rate
iterMax = 250 # Maximum number of iterations
NMax = 10 # Maximum number of neurons
w = range(NMax - 1) # Initialize the size of the weight matrix (it will store NMax-2+1 sets of weights, each of varying size depending on the value of N)
#%% KOHONEN 1D
t = time.time() # Start time
for N in np.arange(2,NMax + 1): # Size of the network
b(w, N, sigma_0, eta_0, iterMax)
elapsedTime = time.time() - t
def b(w, N, sigma_0, eta_0, iterMax):
w[N - 2] = np.random.uniform(0,1,(N,np.size(y,axis=1))) - 0.5 # Initial weights
for iterCount in range(1, iterMax):
c(N, sigma_0, eta_0, iterMax, iterCount, w)
def c(N, sigma_0, eta_0, iterMax, iterCount, w):
# Mix the datapoints to choose them in random order
mixInputs = y[np.random.permutation(np.size(y,axis = 0)),:]
# Decrease the value of the variance and the learning rate
sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount
eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount
for kk in range(np.size(mixInputs,axis = 0)): # Picking one datapoint at a time
d(N, w, mixInputs, sigma, eta, kk)
print(N,iterCount)
def d(N, w, mixInputs, sigma, eta, kk):
selectedInput = mixInputs[kk,:]
# These two lines calculate the weight that is the nearest to the datapoint selected
aux = np.absolute(np.array(np.kron(np.ones((N,1)),selectedInput)) - np.array(w[N - 2]))
aux = np.sum(np.abs(aux)**2,axis=-1)
ii = np.argmin(aux) # The node ii is the winner
for jj in range(N):
e(N, w, sigma, eta, selectedInput, ii, jj)
def e(N, w, sigma, eta, selectedInput, ii, jj):
dist = min(np.absolute(ii-jj) , np.absolute(np.absolute(ii-jj)-N)) # Centering the neighborhood function in the winner
f(N, w, sigma, eta, selectedInput, jj, dist)
def f(N, w, sigma, eta, selectedInput, jj, dist):
w[N - 2][jj,:] = w[N - 2][jj,:] + eta * A(dist,sigma) * (selectedInput - w[N - 2][jj,:]) # Updating the weights
a()
6701974 function calls (6701853 primitive calls) in 4.985 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
238921 0.691 0.000 0.777 0.000 x.py:56(f)
119461 0.613 0.000 4.923 0.000 x.py:43(d)
119461 0.498 0.000 2.144 0.000 shape_base.py:686(kron)
238921 0.462 0.000 1.280 0.000 x.py:52(e)
119461 0.369 0.000 0.495 0.000 numeric.py:1015(outer)
This identifies f()
as the biggest time.
Upvotes: 2