Reputation: 5231
I was trying to implement the following equation:
in matlab. To explain some of the notation df/dt^(1)_{i,j}
should be a vector, z^{(2)}_{k2}
is a real number, a^{(2)}_{i,j}
is a real number, [t^{(2)}_{k2}]
is a vector, x_i
is a vector and t^{(1)}_{i,j}
is a vector. For more clarifying comments on the notation look at the math.stackexchange question related to this. Also, I've tried to heavily comment the code with comments on what should the inputs and outputs be, to minimize confusion on the dimension of the variables in question.
I actually do have one potential implementation (that I believe is correct) but sometimes MATLAB has some nice hidden tricks and was wondering if this was a good implementation of the vectorized equation above or if there was a better one.
Currently here is my code:
function [ dJ_dt1 ] = compute_t1_gradient(t1,x,y,f,z_l1,z_l2,a_l2,c,t2,lambda)
%compute_t1_gradient_loops - computes the t1 parameter of a 2 layer HBF
% Computes dJ_dt1 according to:
% dJ_dt1
% Input:
% t1 = centers (Dp x Dd x Np)
% x = data (D x 1)
% y = label (1 x 1)
% f = f(x) (1 x 1)
% z_l1 = inputs l2 (Np x Dd)
% z_l2 = inputs l1 (K2 x 1)
% a_l2 = activations l2 (Np x Dd)
% a_l3 = activations l3 (K2 x 1)
% c = weights (K2 x 1)
% t2 = centers (K1 x K2)
% lambda = reg param (1 x 1)
% mu_c = step size (1 x 1)
% Output:
% dJ_dt1 = gradient (Dp x Dd x Np)
[Dp, ~, ~] = size(t1);
[Np, Dd] = size(a_l2);
x_parts = reshape(x, [Dp, Np])'; % Np x Dp
K1 = Np * Dd;
a_l2_col_vec = reshape(a_l2', [K1, 1]); %K1 x 1
alpha = bsxfun(@minus, a_l2_col_vec, t2); %K1 x K2
c_z_l2 = (c .* exp(-z_l2))'; % 1 x K2
alpha = bsxfun(@times, c_z_l2, alpha); %K1 x K2
alpha = bsxfun(@times, reshape(exp(-z_l1'),[K1, 1]) , alpha);
alpha = sum(alpha, 2); %K1 x 1
xi_t1 = bsxfun(@minus, x_parts', permute(t1, [1,3,2]));
% alpha K1 x 1
% xi_t1 Dp x Np x Dd
dJ_dt1 = bsxfun(@minus, reshape(alpha,[Dd, Np]), permute(xi_t1, [3, 2, 1]));
dJ_dt1 = permute(dJ_dt1,[3,1,2]);
dJ_dt1 = -4*(y-f)*dJ_dt1;
dJ_dt1 = dJ_dt1 + lambda * 0; %TODO
end
Actually, at this point I decided to implement the above function again as a for loop. Unfortunately, they don't produce the same answer, which makes me skeptical that the above is correct. I will paste the for loop code of what I wanted/intended to vectorize:
function [ dJ_dt1 ] = compute_t1_gradient_loops(t1,x,y,f,z_l1,z_l2,a_l2,c,t2)
%compute_t1_gradient_loops - computes the t1 parameter of a 2 layer HBF
% Computes t1 according to:
% t1 := t1 - mu_c * dJ/dt1
% Input:
% t1 = centers (Dp x Dd x Np)
% x = data (D x 1)
% y = label (1 x 1)
% f = f(x) (1 x 1)
% z_l1 = inputs l2 (Np x Dd)
% z_l2 = inputs l1 (K2 x 1)
% a_l2 = activations l2 (Np x Dd)
% a_l3 = activations l3 (K2 x 1)
% c = weights (K2 x 1)
% t2 = centers (K1 x K2)
% lambda = reg param (1 x 1)
% mu_c = step size (1 x 1)
% Output:
% dJ_dt1 = gradeint (Dp x Dd x Np)
[Dp, ~, ~] = size(t1); %(Dp x Dd x Np)
[Np, Dd] = size(a_l2);
K2 = length(c);
t2_tensor = reshape(t2, Dd, Np, K2);
x_parts = reshape(x, [Dp, Np]);
dJ_dt1 = zeros(Dp, Dd, Np);
for i=1:Dd
xi = x_parts(:,i);
for j=1:Np
t_l1_ij = t1(:,i,j);
a_l2_ij = a_l2(j, i);
z_l1_ij = z_l1(j,i);
alpha_ij = 0;
for k2=1:K2
t2_k2ij = t2_tensor(i,j,k2);
c_k2 = c(k2);
z_l2_k2 = z_l2(k2);
new_delta = c_k2*-1*exp(-z_l2_k2)*2*(a_l2_ij - t2_k2ij);
alpha_ij = alpha_ij + new_delta;
end
alpha_ij = 2*(y-f)*-1*exp(-z_l1_ij)*2*(xi - t_l1_ij);
dJ_dt1(:,i,j) = alpha_ij;
end
end
end
I've actually even approximated the derivative with the way Andrew Ng suggests to check gradient descent like equations with:
to that end I even wrote the code for it:
%% update t1 unit test
%% dimensions
Dp = 3;
Np = 4;
Dd = 2;
K2 = 5;
K1 = Dd * Np;
%% fake data & params
x = (1:Dp*Np)';
y = 3;
c = (1:K2)';
t2 = rand(K1, K2);
t1 = rand(Dp, Dd, Np);
lambda = 0;
mu_t1 = 1;
%% call f(x)
[f, z_l1, z_l2, a_l2, ~ ] = f_star(x,c,t1,t2,Np,Dp);
%% update gradient
dJ_dt1_ij_loops = compute_t1_gradient_loops(t1,x,y,f,z_l1,z_l2,a_l2,c,t2);
dJ_dt1 = compute_t1_gradient(t1,x,y,f,z_l1,z_l2,a_l2,c,t2,lambda);
eps = 1e-4;
e_111 = zeros( size(t1) );
e_111(1,1,1) = eps;
derivative = (J(y, x, c, t2, t1 + e_111, Np, Dp) - J(y, x, c, t2, t1 - e_111, Np, Dp) ) / (2*eps);
derivative
dJ_dt1_ij_loops(1,1,1)
dJ_dt1(1,1,1)
but it seems that neither of the derivatives agree with the "approximated" one. The output for one run looked as follow:
>> update_t1_gradient_unit_test
derivative =
0.0027
dJ_dt1_ij_loops
ans =
0.0177
dJ_dt1
ans =
-0.5182
>>
which isn't clear to me if there is a mistake or not...it seems that it nearly matches the one with the loop, but is that close enough?
Andrew Ng does say:
however, I don't see 4 significant digits agreeing! Not even the same order of magnitude :( I'd guess both are wrong but I can't seem to catch why or where/how.
On a related note, I've also asked to check if the derivative that I have at the top is actually (mathematically correct), since at this point I am not sure what part is wrong and what part is correct. The link to the question is here:
Update:
I have implemented a new version of the derivative with loops and it nearly agrees with a small example I created.
Here is the new implementation (with a bug somewhere...):
function [ dJ_dt1 ] = compute_df_dt1_loops3(t1,x,z_l1,z_l2,a_l2,c,t2)
% Computes t1 according to:
% df/dt1
% Input:
% t1 = centers (Dp x Dd x Np)
% x = data (D x 1)
% z_l1 = inputs l2 (Np x Dd)
% z_l2 = inputs l1 (K2 x 1)
% a_l2 = activations l2 (Np x Dd)
% a_l3 = activations l3 (K2 x 1)
% c = weights (K2 x 1)
% t2 = centers (K1 x K2)
% Output:
% dJ_dt1 = gradeint (Dp x Dd x Np)
[Dp, Dd, Np] = size(t1); %(Dp x Dd x Np)
K2 = length(c);
x_parts = reshape(x, [Dp, Np]);
dJ_dt1 = zeros(Dp, Dd, Np);
for i=1:Np
xi_part = x_parts(:,i);
for j=1:Dd
z_l1_ij = z_l1(i,j);
a_l2_ij = a_l2(i,j);
t_l1_ij = t1(:,i,j);
alpha_ij = 0;
for k2=1:K2
ck2 = c(k2);
t2_k2 = t2(:, k2);
index = (i-1)*Dd + j;
t2_k2_ij = t2_k2(index);
z_l2_k2 = z_l2(k2);
new_delta = ck2*(exp(-z_l2_k2))*2*(a_l2_ij - t2_k2_ij);
alpha_ij = alpha_ij + new_delta;
end
alpha_ij = -1 * alpha_ij * exp(-z_l1_ij)*2*(xi_part - t_l1_ij);
dJ_dt1(:,i,j) = alpha_ij;
end
end
here is the code to compute numerical derivatives (which is correct and working as expected):
function [ dJ_dt1_numerical ] = compute_numerical_derivatives( x, c, t1, t2, eps)
% Computes t1 according to:
% df/dt1 numerically
% Input:
% x = data (D x 1)
% c = weights (K2 x 1)
% t1 = centers (Dp x Dd x Np)
% t2 = centers (K1 x K2)
% Output:
% dJ_dt1 = gradeint (Dp x Dd x Np)
[Dp, Dd, Np] = size(t1);
dJ_dt1_numerical = zeros(Dp, Dd, Np);
for np=1:Np
for dd=1:Dd
for dp=1:Dp
e_dd_dp_np = zeros(Dp, Dd, Np);
e_dd_dp_np(dp,dd,np) = eps;
f_e1 = f_star_loops(x,c,t1+e_dd_dp_np,t2);
f_e2 = f_star_loops(x,c,t1-e_dd_dp_np,t2);
numerical_derivative = (f_e1 - f_e2)/(2*eps);
dJ_dt1_numerical(dp,dd,np) = numerical_derivative;
end
end
end
end
and I will provide the code for f and the numbers I actually used just in case people what to reproduce my results:
and here is the code for what f does (which is also correct and working as expected):
function [ f, z_l1, z_l2, a_l2, a_l3 ] = f_star_loops( x, c, t1, t2)
%f_start - computes 2 layer HBF predictor
% Computes f^*(x) = sum_i c_i a^(3)_i
% Inputs:
% x = data point (D x 1)
% x = [x1, ..., x_np, ..., x_Np]
% c = weights (K2 x 1)
% t2 = centers (K1 x K2)
% t1 = centers (Dp x Dd x Np)
% Outputs:
% f = f^*(x) = sum_i c_i a^(3)_i
% a_l3 = activations l3 (K2 x 1)
% z_l2 = inputs l2 (K2 x 1)
% a_l2 = activations l2 (Np x Dd)
% z_l1 = inputs l1 (Np x Dd)
[Dp, Dd, Np] = size(t1);
z_l1 = zeros(Np, Dd);
a_l2 = zeros(Np, Dd);
x_parts = reshape(x, [Dp, Np]);
%% Compute components of 1st layer z_l1 and a_l1
for np=1:Np
x_np = x_parts(:,np);
t1_np = t1(:,:, np);
for dd=1:Dd
t1_np_dd = t1_np(:, dd);
z_l1_np_dd = norm(t1_np_dd - x_np, 2)^2;
a_l1_np_dd = exp(-z_l1_np_dd);
% a_l1_np_dd = -z_l1_np_dd;
% a_l1_np_dd = sin(-z_l1_np_dd);
% insert
a_l2(np, dd) = a_l1_np_dd;
z_l1(np, dd) = z_l1_np_dd;
end
end
%% Compute components of 2nd layer z_l2 and a_l2
K1 = Dd*Np;
K2 = length(c);
a_l2_vec = reshape(a_l2', [K1,1]);
z_l2 = zeros(K2, 1);
for k2=1:K2
t2_k2 = t2(:, k2); % K2 x 1
z_l2_k2 = norm(t2_k2 - a_l2_vec, 2)^2;
% insert
z_l2(k2) = z_l2_k2;
end
%% Output later 3rd layer
a_l3 = exp(-z_l2);
% a_l3 = -z_l2;
% a_l3 = sin(-z_l2);
f = c' * a_l3;
end
Here is the data I used for testing:
%% Test 1:
% dimensions
disp('>>>>>>++++======--------> update t1 unit test');
% fake data & params
x = (1:6)'/norm(1:6,2)
c = [29, 30, 31, 32]'
t2 = [(13:16)/norm((13:16),2); (17:20)/norm((17:20),2); (21:24)/norm((21:24),2); (25:28)/norm((25:28),2)]'
Dp = 3;
Dd = 2;
Np = 2;
t1 = zeros(Dp,Dd, Np); % (Dp, Dd, Np)
t1(:,:,1) = [(1:3)/norm((1:3),2); (4:6)/norm((4:6),2)]';
t1(:,:,2) = [(7:9)/norm((7:9),2); (10:12)/norm((10:12),2)]';
t1
% call f(x)
[f, z_l1, z_l2, a_l2, a_l3 ] = f_star_loops(x,c,t1,t2)
% gradient
df_dt1_loops = compute_df_dt1_loops3(t1,x,z_l1,z_l2,a_l2,c,t2);
df_dt1_loops2 = compute_df_dt1_loops3(t1,x,z_l1,z_l2,a_l2,c,t2);
eps = 1e-10;
dJ_dt1_numerical = compute_numerical_derivatives( x, c, t1, t2, eps);
disp('---- Derivatives ----');
for np=1:Np
np
dJ_dt1_numerical_np = dJ_dt1_numerical(:,:,np);
dJ_dt1_numerical_np
df_dt1_loops2_np = df_dt1_loops(:,:,np);
df_dt1_loops2_np
end
Notice that the numerical derivatives are now correct (I am sure because I compared to values returned by mathematica that matched, plus f
has been debugged, so it works as I desire).
Here is an example of the output (where the matrices of numerical derivatives should match the matrices of the derivatives using my equations):
---- Derivatives ----
np =
1
dJ_dt1_numerical_np =
7.4924 13.1801
14.9851 13.5230
22.4777 13.8660
df_dt1_loops2_np =
7.4925 5.0190
14.9851 6.2737
22.4776 7.5285
np =
2
dJ_dt1_numerical_np =
11.4395 13.3836
6.9008 6.6363
2.3621 -0.1108
df_dt1_loops2_np =
14.9346 13.3835
13.6943 6.6363
12.4540 -0.1108
Upvotes: 1
Views: 294
Reputation: 35080
Update: there was some misunderstanding on my part about the indices of some quantities in the formula, see also the updated question. I left the original answer below (as the vectorization should be carried out the same way), and at the end I added the final vectorized version corresponding to the actual problem of OP for completeness.
There are some inconsistencies between your codes and your formula. In your formula you have a reference to x_i
, however the corresponding size of your x
array is that for the index j
. This, then, is in agreement with your math.stackexchange question, where i
and j
seem to be interchanged with respect to the notation you use here...
Anyway, here's a fixed looped version of your function:
function [ dJ_dt1 ] = compute_t1_gradient_loops(t1,x,y,f,z_l1,z_l2,a_l2,c,t2)
%compute_t1_gradient_loops - computes the t1 parameter of a 2 layer HBF
% Input:
% t1 = (Dp x Dd x Np)
% x = (D x 1)
% z_l1 = (Np x Dd)
% z_l2 = (K2 x 1)
% a_l2 = (Np x Dd)
% c = (K2 x 1)
% t2 = (K1 x K2)
%
% K1=Dd*Np
% D=Dp*Dd
% Dp,Np,Dd,K2 unique
%
% Output:
% dJ_dt1 = gradient (Dp x Dd x Np)
[Dp, ~, ~] = size(t1); %(Dp x Dd x Np)
[Np, Dd] = size(a_l2);
K2 = length(c);
t2_tensor = reshape(t2, Dd, Np, K2); %Dd x Np x K2
x_parts = reshape(x, [Dp, Dd]); %Dp x Dd
dJ_dt1 = zeros(Dp, Dd, Np); %Dp x Dd x Np
for i=1:Dd
xi = x_parts(:,i);
for j=1:Np
t_l1_ij = t1(:,i,j);
a_l2_ij = a_l2(j, i);
z_l1_ij = z_l1(j,i);
alpha_ij = 0;
for k2=1:K2
t2_k2ij = t2_tensor(i,j,k2);
c_k2 = c(k2);
z_l2_k2 = z_l2(k2);
new_delta = c_k2*exp(-z_l2_k2)*(a_l2_ij - t2_k2ij);
alpha_ij = alpha_ij + new_delta;
end
alpha_ij = -4*alpha_ij* exp(-z_l1_ij)*(xi - t_l1_ij);
dJ_dt1(:,i,j) = alpha_ij;
end
end
end
Some things to note:
x
to D=Dp*Dd
in order to keep the i
index of the formula. Otherwise more things would have to be rethought.[Dp, ~, ~] = size(t1);
you could just use Dp=size(t1,1)
alpha_ij
after the sum, because you overwrote the old value with the prefactor instead of multiplying itIf I misinterpreted your intentions, then let me know and I'll change the looped version accordingly.
Assuming that the looping version does what you want, here's a vectorized version, similar to your original attempt:
function [ dJ_dt1 ] = compute_t1_gradient_vect(t1,x,y,f,z_l1,z_l2,a_l2,c,t2)
%compute_t1_gradient_vect - computes the t1 parameter of a 2 layer HBF
% Input:
% t1 = (Dp x Dd x Np)
% x = (D x 1)
% y = (1 x 1)
% f = (1 x 1)
% z_l1 = (Np x Dd)
% z_l2 = (K2 x 1)
% a_l2 = (Np x Dd)
% c = (K2 x 1)
% t2 = (K1 x K2)
%
% K1=Dd*Np
% D=Dp*Dd
% Dp,Np,Dd,K2 unique
%
% Output:
% dJ_dt1 = gradient (Dp x Dd x Np)
Dp = size(t1,1);
[Np, Dd] = size(a_l2);
K2 = length(c);
t2_tensor = reshape(t2, Dd, Np, K2); %Dd x Np x K2
x_parts = reshape(x, [Dp, Dd]); %Dp x Dd
%reorder things to align for bsxfun later
a_l2=a_l2'; %Dd x Np <-> i,j
z_l1=z_l1'; %Dd x Np <-> i,j
t2_tensor = permute(t2_tensor,[3 1 2]); %K2 x Dd x Np
%the 1D part of the sum to be used in partialsum
%prefactors also put here to minimize computational effort
tempvar_k2 = -4*c.*exp(-z_l2); % K2 x 1
%compute sum(b(k)*(c-d(k)) as c*sum(b(k))-sum(b(k)*d(k)) (NB)
partialsum = a_l2*sum(tempvar_k2) ...
-squeeze(sum(bsxfun(@times,tempvar_k2,t2_tensor),1)); %Dd x Np
%alternative computation by definition:
%partialsum = bsxfun(@minus,a_l2,t2_tensor); %Dd x Np x K2
%partialsum = permute(partialsum,[3 1 2]); %K2 x Dd x Np
%partialsum = squeeze(sum(bsxfun(@times,tempvar_k2,partialsum),1)); %Dd x Np
%last part of the formula, (x-t1)
tempvar_lastterm = bsxfun(@minus,x_parts,t1); %Dp x Dd x Np
tempvar_lastterm = permute(tempvar_lastterm,[2 3 1]); %Dd x Np x Dp
%put together what we have
dJ_dt1 = bsxfun(@times,partialsum.*exp(-z_l1),tempvar_lastterm); %Dd x Np x Dp
dJ_dt1 = permute(dJ_dt1,[3 1 2]); %Dp x Dd x Np
Again, some things to note:
k2
-dependent part of the sum, as it is used twice in the next step.-4
to this variable, since you will only have to multiply K2
times instead of Dp*Dd*Np
times, which probably makes a difference for large matrices.k2
sum by separating (a-t2)
into two sums, see the comment ending in (NB)
. It turns out that for large matrices (multiply your nice test cases with dims 2-3-4-5 by 100) this separation leads to considerable speedup. Of course if K2
is much larger than the inner dimensions of t2
then you lose on this trick.x_j
rather than x_i
, then the dimensions have to be juggled around accordingly.I checked the looped and both of my vectorized versions for two test cases. First, your original example of
%% update t1 unit test
%% dimensions
Dp = 3;
Np = 4;
Dd = 2;
K2 = 5;
K1 = Dd * Np;
%% fake data & params
x = (1:Dp*Dd)';
y = 3;
c = (1:K2)';
t2 = rand(K1, K2);
t1 = rand(Dp, Dd, Np);
%% update gradient
dJ_dt1_ij_loops = compute_t1_gradient_loops(t1,x,y,f,z_l1,z_l2,a_l2,c,t2);
dJ_dt1_vect = compute_t1_gradient_vect(t1,x,y,f,z_l1,z_l2,a_l2,c,t2);
dJ_dt1_vect2 = compute_t1_gradient_vect2(t1,x,y,f,z_l1,z_l2,a_l2,c,t2);
Note that again I changed the definition of x
, and ..._vect2
stands for the "naive" version of the vectorized code. It turns out that the resulting derivatives agree exactly for the looping version and the naive vectorized, while there is a maximal 2e-14
difference between them and the optimized vector version. Which means we're good. And the difference near machine precision is simply due to the fact that the computations were performed in a different order.
In order to gauge performance, I multiplied the dimensions of the original test case by 100:
%% dimensions
Dp = 300;
Np = 400;
Dd = 200;
K2 = 500;
K1 = Dd * Np;
And I also set variables to check cputime
before and after each function call (since tic/toc
only measures wall clock time). The measured times were 23 seconds, 2 seconds and 4 seconds for the looped, the optimized and the "naive" vector version, respectively. On the other hand, the maximal difference between the two latter derivatives was now 1.8e-5
. Of course, our test data are random, which are not the best conditioned data to say the least. Possibly in an actual application this difference will not be an issue, but you should always be careful with loss of precision (and we're specifically subtracting two possibly large numbers in the optimized version).
You could of course try playing around with the partitioning of your formula to terms by which you compute it, there might be an even more efficient way. It could also all depend on the size of your arrays.
You mentioned that you tried to estimate the derivative from definition, basically using a symmetric derivative. You didn't get what you expect, probably because of the shortcomings of your original functions. However, I'd like to note a few things here too. The fact that your epsilon
-version does not agree with your original tries could be due to
J
(I know you're trying to debug this case on math.SE)J
function computing your symmetric derivative, which is only spoken of in your questionIf everything checks out, you could still have a purely mathematical source of disagreement: the factor of epsilon=1e-4
you use is completely arbitrary. When you check your derivative this way you basically linearize your function around a given point. If your function varies too much (i.e. too nonlinearly) in a neighbourhood of radius epsilon
, your symmetric derivative will be inaccurate compared to the exact value. When doing these checks you should be careful to use adequate small parameters in your derivative: small enough to expect linear behaviour of your function, but large enough to avoid numerical noise arising from the 1/epsilon
factor.
Final note: you should probably avoid naming a variable eps
in matlab, as that's a built-in function telling you the "machine epsilon" (have a look at help eps
), corresponding to the precision at number 1
by default (i.e. without an input argument). While you can call the complex unit 1i
if you have a variable called i
, it's probably safer to avoid built-in names if possible.
Updated final vectorized version to correspond to the updated question of OP:
function [ dJ_dt1 tempout] = compute_t1_gradient_vect(t1,x,z_l1,z_l2,a_l2,c,t2)
%compute_t1_gradient_vect - computes the t1 parameter of a 2 layer HBF
% Input:
% t1 = (Dp x Dd x Np)
% x = (D x 1)
% z_l1 = (Np x Dd)
% z_l2 = (K2 x 1)
% a_l2 = (Np x Dd)
% c = (K2 x 1)
% t2 = (K1 x K2)
%
% K1=Dd*Np
% D=Dp*Np
% Dp,Np,Dd,K2 unique
%
% Output:
% dJ_dt1 = gradient (Dp x Dd x Np)
Dp = size(t1,1);
[Np, Dd] = size(a_l2);
K2 = length(c);
t2_tensor = reshape(t2, Dd, Np, K2); %Dd x Np x K2
x_parts = reshape(x, [Dp, Np]); %Dp x Np
t1 = permute(t1,[1 3 2]); %Dp x Np x Dd
a_l2=a_l2'; %Dd x Np <-> j,i
z_l1=z_l1'; %Dd x Np <-> j,i
tempvar_k2 = -4*c.*exp(-z_l2); % K2 x 1
partialsum = bsxfun(@minus,a_l2,t2_tensor); %Dd x Np x K2
partialsum = permute(partialsum,[3 1 2]); %K2 x Dd x Np
partialsum = squeeze(sum(bsxfun(@times,tempvar_k2,partialsum),1)); %Dd x Np
tempvar_lastterm = bsxfun(@minus,x_parts,t1); %Dp x Np x Dd
tempvar_lastterm = permute(tempvar_lastterm,[3 2 1]); %Dd x Np x Dp
dJ_dt1 = bsxfun(@times,partialsum.*exp(-z_l1),tempvar_lastterm); %Dd x Np x Dp
tempout=tempvar_lastterm;
dJ_dt1 = permute(dJ_dt1,[3 1 2]); %Dp x Dd x Np
Note that this is almost identical to the original vectorized version, only the dimensions of x
are changed and some indices have been permuted.
Upvotes: 1