How does one vectorize nicely in MATLAB the following partial derivative with respect to a vector?

Question

I was trying to implement the following equation:

in matlab. To explain some of the notation df/dt^(1)_{i,j} should be a vector, z^{(2)}_{k2} is a real number, a^{(2)}_{i,j} is a real number, [t^{(2)}_{k2}] is a vector, x_i is a vector and t^{(1)}_{i,j} is a vector. For more clarifying comments on the notation look at the math.stackexchange question related to this. Also, I've tried to heavily comment the code with comments on what should the inputs and outputs be, to minimize confusion on the dimension of the variables in question.

I actually do have one potential implementation (that I believe is correct) but sometimes MATLAB has some nice hidden tricks and was wondering if this was a good implementation of the vectorized equation above or if there was a better one.

Currently here is my code:

function [ dJ_dt1 ] = compute_t1_gradient(t1,x,y,f,z_l1,z_l2,a_l2,c,t2,lambda)
%compute_t1_gradient_loops - computes the t1 parameter of a 2 layer HBF
%   Computes dJ_dt1 according to:
%       dJ_dt1
%   Input:
%       t1 = centers (Dp x Dd x Np)
%       x = data (D x 1)
%       y = label (1 x 1)
%       f = f(x) (1 x 1)
%       z_l1 = inputs l2 (Np x Dd)
%       z_l2 = inputs l1 (K2 x 1)
%       a_l2 = activations l2 (Np x Dd)
%       a_l3 = activations l3 (K2 x 1)
%       c = weights (K2 x 1)
%       t2 = centers (K1 x K2)
%       lambda = reg param (1 x 1)
%       mu_c = step size (1 x 1)
%   Output:
%       dJ_dt1 = gradient (Dp x Dd x Np)
[Dp, ~, ~] = size(t1);
[Np, Dd] = size(a_l2);
x_parts = reshape(x, [Dp, Np])'; % Np x Dp
K1 = Np * Dd;
a_l2_col_vec = reshape(a_l2', [K1, 1]); %K1 x 1
alpha = bsxfun(@minus, a_l2_col_vec, t2); %K1 x K2
c_z_l2 = (c .* exp(-z_l2))'; % 1 x K2
alpha = bsxfun(@times, c_z_l2, alpha); %K1 x K2
alpha = bsxfun(@times, reshape(exp(-z_l1'),[K1, 1]) , alpha);
alpha = sum(alpha, 2); %K1 x 1
xi_t1 = bsxfun(@minus, x_parts', permute(t1, [1,3,2]));
% alpha K1 x 1
% xi_t1 Dp x Np x Dd
dJ_dt1 = bsxfun(@minus, reshape(alpha,[Dd, Np]), permute(xi_t1, [3, 2, 1]));
dJ_dt1 = permute(dJ_dt1,[3,1,2]);
dJ_dt1 = -4*(y-f)*dJ_dt1;
dJ_dt1 = dJ_dt1 + lambda * 0; %TODO
end

Actually, at this point I decided to implement the above function again as a for loop. Unfortunately, they don't produce the same answer, which makes me skeptical that the above is correct. I will paste the for loop code of what I wanted/intended to vectorize:

function [ dJ_dt1 ] = compute_t1_gradient_loops(t1,x,y,f,z_l1,z_l2,a_l2,c,t2)
%compute_t1_gradient_loops - computes the t1 parameter of a 2 layer HBF
%   Computes t1 according to:
%       t1 := t1 - mu_c * dJ/dt1
%   Input:
%       t1 = centers (Dp x Dd x Np)
%       x = data (D x 1)
%       y = label (1 x 1)
%       f = f(x) (1 x 1)
%       z_l1 = inputs l2 (Np x Dd)
%       z_l2 = inputs l1 (K2 x 1)
%       a_l2 = activations l2 (Np x Dd)
%       a_l3 = activations l3 (K2 x 1)
%       c = weights (K2 x 1)
%       t2 = centers (K1 x K2)
%       lambda = reg param (1 x 1)
%       mu_c = step size (1 x 1)
%   Output:
%       dJ_dt1 = gradeint (Dp x Dd x Np)
[Dp, ~, ~] = size(t1); %(Dp x Dd x Np)
[Np, Dd] = size(a_l2);
K2 = length(c);
t2_tensor = reshape(t2, Dd, Np, K2);
x_parts = reshape(x, [Dp, Np]);
dJ_dt1 = zeros(Dp, Dd, Np);
for i=1:Dd
    xi = x_parts(:,i);
    for j=1:Np
        t_l1_ij = t1(:,i,j);
        a_l2_ij = a_l2(j, i);
        z_l1_ij = z_l1(j,i);
        alpha_ij = 0;
        for k2=1:K2
            t2_k2ij = t2_tensor(i,j,k2);
            c_k2 = c(k2);
            z_l2_k2 = z_l2(k2);
            new_delta = c_k2*-1*exp(-z_l2_k2)*2*(a_l2_ij - t2_k2ij);
            alpha_ij = alpha_ij + new_delta;
        end
        alpha_ij = 2*(y-f)*-1*exp(-z_l1_ij)*2*(xi - t_l1_ij);
        dJ_dt1(:,i,j) = alpha_ij;
    end
end
end

I've actually even approximated the derivative with the way Andrew Ng suggests to check gradient descent like equations with:

to that end I even wrote the code for it:

%% update t1 unit test
%% dimensions
Dp = 3;
Np = 4;
Dd = 2;
K2 = 5;
K1 = Dd * Np;
%% fake data & params
x = (1:Dp*Np)';
y = 3;
c = (1:K2)';
t2 = rand(K1, K2);
t1 = rand(Dp, Dd, Np);
lambda = 0;
mu_t1 = 1;
%% call f(x)
[f, z_l1, z_l2, a_l2, ~ ] = f_star(x,c,t1,t2,Np,Dp);
%% update gradient
dJ_dt1_ij_loops = compute_t1_gradient_loops(t1,x,y,f,z_l1,z_l2,a_l2,c,t2);
dJ_dt1 = compute_t1_gradient(t1,x,y,f,z_l1,z_l2,a_l2,c,t2,lambda);
eps = 1e-4;
e_111 = zeros( size(t1) );
e_111(1,1,1) = eps;
derivative = (J(y, x, c, t2, t1 + e_111, Np, Dp) - J(y, x, c, t2, t1  - e_111, Np, Dp) ) / (2*eps);
derivative
dJ_dt1_ij_loops(1,1,1)
dJ_dt1(1,1,1)

but it seems that neither of the derivatives agree with the "approximated" one. The output for one run looked as follow:

>> update_t1_gradient_unit_test

derivative =

    0.0027

dJ_dt1_ij_loops

ans =

    0.0177

dJ_dt1

ans =

   -0.5182

>>

which isn't clear to me if there is a mistake or not...it seems that it nearly matches the one with the loop, but is that close enough?

Andrew Ng does say:

however, I don't see 4 significant digits agreeing! Not even the same order of magnitude :( I'd guess both are wrong but I can't seem to catch why or where/how.

On a related note, I've also asked to check if the derivative that I have at the top is actually (mathematically correct), since at this point I am not sure what part is wrong and what part is correct. The link to the question is here:

https://math.stackexchange.com/questions/1386958/partial-derivative-of-recursive-exponential-fx-sumk-2-k-2-1c-k-2-e

Update:

I have implemented a new version of the derivative with loops and it nearly agrees with a small example I created.

Here is the new implementation (with a bug somewhere...):

function [ dJ_dt1 ] = compute_df_dt1_loops3(t1,x,z_l1,z_l2,a_l2,c,t2)
%   Computes t1 according to:
%       df/dt1
%   Input:
%       t1 = centers (Dp x Dd x Np)
%       x = data (D x 1)
%       z_l1 = inputs l2 (Np x Dd)
%       z_l2 = inputs l1 (K2 x 1)
%       a_l2 = activations l2 (Np x Dd)
%       a_l3 = activations l3 (K2 x 1)
%       c = weights (K2 x 1)
%       t2 = centers (K1 x K2)
%   Output:
%       dJ_dt1 = gradeint (Dp x Dd x Np)
[Dp, Dd, Np] = size(t1); %(Dp x Dd x Np)
K2 = length(c);
x_parts = reshape(x, [Dp, Np]);
dJ_dt1 = zeros(Dp, Dd, Np);
for i=1:Np
    xi_part = x_parts(:,i);
    for j=1:Dd
        z_l1_ij = z_l1(i,j);
        a_l2_ij = a_l2(i,j);
        t_l1_ij = t1(:,i,j);
        alpha_ij = 0;
        for k2=1:K2
            ck2 = c(k2);
            t2_k2 = t2(:, k2);
            index = (i-1)*Dd + j;
            t2_k2_ij = t2_k2(index);
            z_l2_k2 = z_l2(k2);
            new_delta = ck2*(exp(-z_l2_k2))*2*(a_l2_ij - t2_k2_ij);
            alpha_ij = alpha_ij + new_delta;
        end
        alpha_ij = -1 * alpha_ij * exp(-z_l1_ij)*2*(xi_part - t_l1_ij);
        dJ_dt1(:,i,j) = alpha_ij;
    end
end

here is the code to compute numerical derivatives (which is correct and working as expected):

function [ dJ_dt1_numerical ] = compute_numerical_derivatives( x, c, t1, t2, eps)
%   Computes t1 according to:
%       df/dt1 numerically
%   Input:
%       x = data (D x 1)
%       c = weights (K2 x 1)
%       t1 = centers (Dp x Dd x Np)
%       t2 = centers (K1 x K2)
%   Output:
%       dJ_dt1 = gradeint (Dp x Dd x Np)
[Dp, Dd, Np] = size(t1);
dJ_dt1_numerical = zeros(Dp, Dd, Np);
for np=1:Np
    for dd=1:Dd
        for dp=1:Dp
            e_dd_dp_np = zeros(Dp, Dd, Np);
            e_dd_dp_np(dp,dd,np) = eps;
            f_e1 = f_star_loops(x,c,t1+e_dd_dp_np,t2);
            f_e2 = f_star_loops(x,c,t1-e_dd_dp_np,t2);
            numerical_derivative = (f_e1 - f_e2)/(2*eps);
            dJ_dt1_numerical(dp,dd,np) = numerical_derivative;
        end
    end
end
end

and I will provide the code for f and the numbers I actually used just in case people what to reproduce my results:

and here is the code for what f does (which is also correct and working as expected):

function [ f, z_l1, z_l2, a_l2, a_l3 ] = f_star_loops( x, c, t1, t2)
%f_start - computes 2 layer HBF predictor
%   Computes f^*(x) = sum_i c_i a^(3)_i
%   Inputs:
%       x = data point (D x 1)
%           x = [x1, ..., x_np, ..., x_Np]
%       c = weights (K2 x 1)
%       t2 = centers (K1 x K2)
%       t1 = centers (Dp x Dd x Np)
%   Outputs:
%       f = f^*(x) = sum_i c_i a^(3)_i
%       a_l3 = activations l3 (K2 x 1)
%       z_l2 = inputs l2 (K2 x 1)
%       a_l2 = activations l2 (Np x Dd)
%       z_l1 = inputs l1 (Np x Dd)
[Dp, Dd, Np] = size(t1);
z_l1 = zeros(Np, Dd);
a_l2 = zeros(Np, Dd);
x_parts = reshape(x, [Dp, Np]);
%% Compute components of 1st layer z_l1 and a_l1
for np=1:Np
    x_np = x_parts(:,np);
    t1_np = t1(:,:, np);
    for dd=1:Dd
        t1_np_dd = t1_np(:, dd);
        z_l1_np_dd = norm(t1_np_dd - x_np, 2)^2;
        a_l1_np_dd = exp(-z_l1_np_dd);
%         a_l1_np_dd = -z_l1_np_dd;
%         a_l1_np_dd = sin(-z_l1_np_dd);
        % insert
        a_l2(np, dd) = a_l1_np_dd;
        z_l1(np, dd) = z_l1_np_dd;
    end
end
%% Compute components of 2nd layer z_l2 and a_l2
K1 = Dd*Np;
K2 = length(c);
a_l2_vec = reshape(a_l2', [K1,1]);
z_l2 = zeros(K2, 1);
for k2=1:K2
    t2_k2 = t2(:, k2); % K2 x 1
    z_l2_k2 = norm(t2_k2 - a_l2_vec, 2)^2;
    % insert
    z_l2(k2) = z_l2_k2;
end
%% Output later 3rd layer
a_l3 = exp(-z_l2);
% a_l3 = -z_l2;
% a_l3 = sin(-z_l2);
f = c' * a_l3;
end

Here is the data I used for testing:

%% Test 1: 
% dimensions
disp('>>>>>>++++======--------> update t1 unit test');
% fake data & params
x = (1:6)'/norm(1:6,2)
c = [29, 30, 31, 32]'
t2 = [(13:16)/norm((13:16),2); (17:20)/norm((17:20),2); (21:24)/norm((21:24),2); (25:28)/norm((25:28),2)]'
Dp = 3;
Dd = 2;
Np = 2;
t1 = zeros(Dp,Dd, Np); % (Dp, Dd, Np)
t1(:,:,1) = [(1:3)/norm((1:3),2); (4:6)/norm((4:6),2)]';
t1(:,:,2) = [(7:9)/norm((7:9),2); (10:12)/norm((10:12),2)]';
t1
% call f(x)
[f, z_l1, z_l2, a_l2, a_l3 ] = f_star_loops(x,c,t1,t2)
% gradient
df_dt1_loops = compute_df_dt1_loops3(t1,x,z_l1,z_l2,a_l2,c,t2);
df_dt1_loops2 = compute_df_dt1_loops3(t1,x,z_l1,z_l2,a_l2,c,t2);
eps = 1e-10;
dJ_dt1_numerical = compute_numerical_derivatives( x, c, t1, t2, eps);
disp('---- Derivatives ----');
for np=1:Np
    np
    dJ_dt1_numerical_np = dJ_dt1_numerical(:,:,np);
    dJ_dt1_numerical_np
    df_dt1_loops2_np = df_dt1_loops(:,:,np);
    df_dt1_loops2_np
end

Notice that the numerical derivatives are now correct (I am sure because I compared to values returned by mathematica that matched, plus f has been debugged, so it works as I desire).

Here is an example of the output (where the matrices of numerical derivatives should match the matrices of the derivatives using my equations):

---- Derivatives ----

np =

     1


dJ_dt1_numerical_np =

    7.4924   13.1801
   14.9851   13.5230
   22.4777   13.8660


df_dt1_loops2_np =

    7.4925    5.0190
   14.9851    6.2737
   22.4776    7.5285


np =

     2


dJ_dt1_numerical_np =

   11.4395   13.3836
    6.9008    6.6363
    2.3621   -0.1108


df_dt1_loops2_np =

   14.9346   13.3835
   13.6943    6.6363
   12.4540   -0.1108

Andras Deak -- Слава Україні · Accepted Answer

Update: there was some misunderstanding on my part about the indices of some quantities in the formula, see also the updated question. I left the original answer below (as the vectorization should be carried out the same way), and at the end I added the final vectorized version corresponding to the actual problem of OP for completeness.

The problem

There are some inconsistencies between your codes and your formula. In your formula you have a reference to x_i, however the corresponding size of your x array is that for the index j. This, then, is in agreement with your math.stackexchange question, where i and j seem to be interchanged with respect to the notation you use here...

Anyway, here's a fixed looped version of your function:

function [ dJ_dt1 ] = compute_t1_gradient_loops(t1,x,y,f,z_l1,z_l2,a_l2,c,t2)
%compute_t1_gradient_loops - computes the t1 parameter of a 2 layer HBF
%   Input:
%       t1 = (Dp x Dd x Np)
%       x = (D x 1)
%       z_l1 = (Np x Dd)
%       z_l2 = (K2 x 1)
%       a_l2 = (Np x Dd)
%       c =  (K2 x 1)
%       t2 = (K1 x K2)
%
%       K1=Dd*Np
%        D=Dp*Dd
%       Dp,Np,Dd,K2 unique
%
%   Output:
%       dJ_dt1 = gradient (Dp x Dd x Np)
[Dp, ~, ~] = size(t1); %(Dp x Dd x Np)
[Np, Dd] = size(a_l2);
K2 = length(c);
t2_tensor = reshape(t2, Dd, Np, K2);  %Dd x Np x K2
x_parts = reshape(x, [Dp, Dd]);       %Dp x Dd
dJ_dt1 = zeros(Dp, Dd, Np);           %Dp x Dd x Np
for i=1:Dd
    xi = x_parts(:,i);
    for j=1:Np
        t_l1_ij = t1(:,i,j);
        a_l2_ij = a_l2(j, i);
        z_l1_ij = z_l1(j,i);
        alpha_ij = 0;
        for k2=1:K2
            t2_k2ij = t2_tensor(i,j,k2);
            c_k2 = c(k2);
            z_l2_k2 = z_l2(k2);
            new_delta = c_k2*exp(-z_l2_k2)*(a_l2_ij - t2_k2ij);
            alpha_ij = alpha_ij + new_delta;
        end
        alpha_ij = -4*alpha_ij* exp(-z_l1_ij)*(xi - t_l1_ij);
        dJ_dt1(:,i,j) = alpha_ij;
    end
end
end

Some things to note:

I changed the size of x to D=Dp*Dd in order to keep the i index of the formula. Otherwise more things would have to be rethought.
Instead of [Dp, ~, ~] = size(t1); you could just use Dp=size(t1,1)
In your looped version you forgot to keep alpha_ij after the sum, because you overwrote the old value with the prefactor instead of multiplying it

If I misinterpreted your intentions, then let me know and I'll change the looped version accordingly.

A vectorized version

Assuming that the looping version does what you want, here's a vectorized version, similar to your original attempt:

function [ dJ_dt1 ] = compute_t1_gradient_vect(t1,x,y,f,z_l1,z_l2,a_l2,c,t2)
%compute_t1_gradient_vect - computes the t1 parameter of a 2 layer HBF
%   Input:
%       t1 = (Dp x Dd x Np)
%       x = (D x 1)
%       y = (1 x 1)
%       f = (1 x 1)
%       z_l1 = (Np x Dd)
%       z_l2 = (K2 x 1)
%       a_l2 = (Np x Dd)
%       c =  (K2 x 1)
%       t2 = (K1 x K2)
%
%       K1=Dd*Np
%        D=Dp*Dd
%       Dp,Np,Dd,K2 unique
%
%   Output:
%       dJ_dt1 = gradient (Dp x Dd x Np)
Dp = size(t1,1);
[Np, Dd] = size(a_l2);
K2 = length(c);
t2_tensor = reshape(t2, Dd, Np, K2);  %Dd x Np x K2
x_parts = reshape(x, [Dp, Dd]);       %Dp x Dd

%reorder things to align for bsxfun later
a_l2=a_l2'; %Dd x Np <-> i,j
z_l1=z_l1'; %Dd x Np <-> i,j
t2_tensor = permute(t2_tensor,[3 1 2]); %K2 x Dd x Np

%the 1D part of the sum to be used in partialsum
%prefactors also put here to minimize computational effort
tempvar_k2 = -4*c.*exp(-z_l2); % K2 x 1

%compute sum(b(k)*(c-d(k)) as c*sum(b(k))-sum(b(k)*d(k))  (NB)
partialsum = a_l2*sum(tempvar_k2) ...
             -squeeze(sum(bsxfun(@times,tempvar_k2,t2_tensor),1)); %Dd x Np

%alternative computation by definition:
%partialsum = bsxfun(@minus,a_l2,t2_tensor); %Dd x Np x K2
%partialsum = permute(partialsum,[3 1 2]); %K2 x Dd x Np
%partialsum = squeeze(sum(bsxfun(@times,tempvar_k2,partialsum),1)); %Dd x Np

%last part of the formula, (x-t1)
tempvar_lastterm = bsxfun(@minus,x_parts,t1); %Dp x Dd x Np
tempvar_lastterm = permute(tempvar_lastterm,[2 3 1]); %Dd x Np x Dp

%put together what we have
dJ_dt1 = bsxfun(@times,partialsum.*exp(-z_l1),tempvar_lastterm); %Dd x Np x Dp
dJ_dt1 = permute(dJ_dt1,[3 1 2]); %Dp x Dd x Np

Again, some things to note:

I defined a temporary variable for the purely k2-dependent part of the sum, as it is used twice in the next step.
I also attached the net prefactor -4 to this variable, since you will only have to multiply K2 times instead of Dp*Dd*Np times, which probably makes a difference for large matrices.
My function as-is computes the k2 sum by separating (a-t2) into two sums, see the comment ending in (NB). It turns out that for large matrices (multiply your nice test cases with dims 2-3-4-5 by 100) this separation leads to considerable speedup. Of course if K2 is much larger than the inner dimensions of t2 then you lose on this trick.
I added the "naive" version of the sum in comments for completeness and for testing.
In the end we just piece together the factors of the derivative: the sum, the second exponential and the final term. Note that if your final term contains x_j rather than x_i, then the dimensions have to be juggled around accordingly.

Performance

I checked the looped and both of my vectorized versions for two test cases. First, your original example of

%% update t1 unit test
%% dimensions
Dp = 3;
Np = 4;
Dd = 2;
K2 = 5;
K1 = Dd * Np;
%% fake data & params
x = (1:Dp*Dd)';
y = 3;
c = (1:K2)';
t2 = rand(K1, K2);
t1 = rand(Dp, Dd, Np);
%% update gradient
dJ_dt1_ij_loops = compute_t1_gradient_loops(t1,x,y,f,z_l1,z_l2,a_l2,c,t2);
dJ_dt1_vect = compute_t1_gradient_vect(t1,x,y,f,z_l1,z_l2,a_l2,c,t2);
dJ_dt1_vect2 = compute_t1_gradient_vect2(t1,x,y,f,z_l1,z_l2,a_l2,c,t2);

Note that again I changed the definition of x, and ..._vect2 stands for the "naive" version of the vectorized code. It turns out that the resulting derivatives agree exactly for the looping version and the naive vectorized, while there is a maximal 2e-14 difference between them and the optimized vector version. Which means we're good. And the difference near machine precision is simply due to the fact that the computations were performed in a different order.

In order to gauge performance, I multiplied the dimensions of the original test case by 100:

%% dimensions
Dp = 300;
Np = 400;
Dd = 200;
K2 = 500;
K1 = Dd * Np;

And I also set variables to check cputime before and after each function call (since tic/toc only measures wall clock time). The measured times were 23 seconds, 2 seconds and 4 seconds for the looped, the optimized and the "naive" vector version, respectively. On the other hand, the maximal difference between the two latter derivatives was now 1.8e-5. Of course, our test data are random, which are not the best conditioned data to say the least. Possibly in an actual application this difference will not be an issue, but you should always be careful with loss of precision (and we're specifically subtracting two possibly large numbers in the optimized version).

You could of course try playing around with the partitioning of your formula to terms by which you compute it, there might be an even more efficient way. It could also all depend on the size of your arrays.

The semi-analytical check

You mentioned that you tried to estimate the derivative from definition, basically using a symmetric derivative. You didn't get what you expect, probably because of the shortcomings of your original functions. However, I'd like to note a few things here too. The fact that your epsilon-version does not agree with your original tries could be due to

implementation errors in your original try
error in your formula, i.e. it does not actually correspond to the derivative of J (I know you're trying to debug this case on math.SE)
error in your mystical J function computing your symmetric derivative, which is only spoken of in your question

If everything checks out, you could still have a purely mathematical source of disagreement: the factor of epsilon=1e-4 you use is completely arbitrary. When you check your derivative this way you basically linearize your function around a given point. If your function varies too much (i.e. too nonlinearly) in a neighbourhood of radius epsilon, your symmetric derivative will be inaccurate compared to the exact value. When doing these checks you should be careful to use adequate small parameters in your derivative: small enough to expect linear behaviour of your function, but large enough to avoid numerical noise arising from the 1/epsilon factor.

Final note: you should probably avoid naming a variable eps in matlab, as that's a built-in function telling you the "machine epsilon" (have a look at help eps), corresponding to the precision at number 1 by default (i.e. without an input argument). While you can call the complex unit 1i if you have a variable called i, it's probably safer to avoid built-in names if possible.

Updated final vectorized version to correspond to the updated question of OP:

function [ dJ_dt1 tempout] = compute_t1_gradient_vect(t1,x,z_l1,z_l2,a_l2,c,t2)
%compute_t1_gradient_vect - computes the t1 parameter of a 2 layer HBF
%   Input:
%       t1 = (Dp x Dd x Np)
%       x = (D x 1)
%       z_l1 = (Np x Dd)
%       z_l2 = (K2 x 1)
%       a_l2 = (Np x Dd)
%       c =  (K2 x 1)
%       t2 = (K1 x K2)
%
%       K1=Dd*Np
%        D=Dp*Np
%       Dp,Np,Dd,K2 unique
%
%   Output:
%       dJ_dt1 = gradient (Dp x Dd x Np)
Dp = size(t1,1);
[Np, Dd] = size(a_l2);
K2 = length(c);
t2_tensor = reshape(t2, Dd, Np, K2);  %Dd x Np x K2
x_parts = reshape(x, [Dp, Np]);       %Dp x Np
t1 = permute(t1,[1 3 2]);             %Dp x Np x Dd

a_l2=a_l2'; %Dd x Np <-> j,i
z_l1=z_l1'; %Dd x Np <-> j,i

tempvar_k2 = -4*c.*exp(-z_l2); % K2 x 1

partialsum = bsxfun(@minus,a_l2,t2_tensor); %Dd x Np x K2
partialsum = permute(partialsum,[3 1 2]);   %K2 x Dd x Np
partialsum = squeeze(sum(bsxfun(@times,tempvar_k2,partialsum),1)); %Dd x Np

tempvar_lastterm = bsxfun(@minus,x_parts,t1);         %Dp x Np x Dd
tempvar_lastterm = permute(tempvar_lastterm,[3 2 1]); %Dd x Np x Dp

dJ_dt1 = bsxfun(@times,partialsum.*exp(-z_l1),tempvar_lastterm); %Dd x Np x Dp
tempout=tempvar_lastterm;
dJ_dt1 = permute(dJ_dt1,[3 1 2]); %Dp x Dd x Np

Note that this is almost identical to the original vectorized version, only the dimensions of x are changed and some indices have been permuted.

How does one vectorize nicely in MATLAB the following partial derivative with respect to a vector?

Answers (1)

The problem

A vectorized version

Performance

The semi-analytical check

Related Questions