Austin
Austin

Reputation: 7329

Understanding a FastICA implementation

I'm trying to implement FastICA (independent component analysis) for blind signal separation of images, but first I thought I'd take a look at some examples from Github that produce good results. I'm trying to compare the main loop from the algorithm's steps on Wikipedia's FastICA and I'm having quite a bit of difficulty seeing how they're actually the same.

They look very similar, but there's a few differences that I don't understand. It looks like this implementation is similar to (or the same as) the "Multiple component extraction" version from Wiki.

Would someone please help me understand what's going on in the four or so lines having to do with the nonlinearity function with its first and second derivatives, and the first line of updating the weight vector? Any help is greatly appreciated!

Here's the implementation with the variables changed to mirror the Wiki more closely:

% X is sized (NxM, 3x50K) mixed image data matrix (one row for each mixed image) 

C=3; % number of components to separate                       

W=zeros(numofIC,VariableNum); % weights matrix  

for p=1:C       

    % initialize random weight vector of length N             
    wp = rand(C,1);                   
    wp = wp / norm(wp);  

    % like do:
    i = 1;
    maxIterations = 100; 
    while i <= maxIterations+1

       % until mat iterations 
       if i == maxIterations    
            fprintf('No convergence: ', p,maxIterations); 
            break; 
        end 

        wp_old = wp; 

        % this is the main part of the algorithm and where
        % I'm confused about the particular implementation

        u = 1; 
        t = X'*b; 
        g = t.^3; 
        dg = 3*t.^2; 
        wp = ((1-u)*t'*g*wp+u*X*g)/M-mean(dg)*wp;

        % 2nd and 3rd wp update steps make sense to me   
        wp = wp-W*W'*wp;                       
        wp = wp / norm(wp);  

        % or until w_p converges
        if abs(abs(b'*bOld)-1)<1e-10      
             W(:,p)=b;                  
             break; 
         end 

        i=i+1;         
    end 
end 

And the Wiki algorithms for quick reference:

enter image description here

Upvotes: 3

Views: 1655

Answers (1)

Anthony
Anthony

Reputation: 3793

First, I don't understand why the term that is always zero remains in the code:

wp = ((1-u)*t'*g*wp+u*X*g)/M-mean(dg)*wp;

The above can be simplified into:

wp = X*g/M-mean(dg)*wp;

Also removing u since it is always 1.

Second, I believe the following line is wrong:

t = X'*b;

The correct expression is:

t = X'*wp;

Now let's go through each variable here. Let's refer to

w = E{Xg(wTX)T} - E{g'(wTX)}w

as the iteration equation.

  • X is your input data, i.e. X in the iteration equation.

  • wp is the weight vector, i.e. w in the iteration equation. Its initial value is randomised.

  • g is the first derivative of a nonquadratic nonlinear function, i.e. g(wTX) in the iteration equation

  • dg is the first derivative of g, i.e. g'(wTX) in the iteration equation

  • M although its definition is not shown in the code you provide, but I think it should be the size of X.


Having the knowledge of the meaning of all variables, we can now try to understand the codes.

    t = X'*b; 

The above line computes wTX.

    g = t.^3; 

The above line computes g(wTX) = (wTX)3. Note that g(u) can be any equation as long as f(u), where g(u) = df(u)/du, is nonlinear and nonquadratic.

    dg = 3*t.^2; 

The above line computes the derivative of g.

    wp = X*g/M-mean(dg)*wp;

Xg obviously calculates Xg(wTX). Xg/M calculates the average of Xg, which is equivalent to E{Xg(wTX)T}.

mean(dg) is E{g'(wTX)} and multiplies by wp or w in the equation.

Now you have what you needed for Newton-Raphson Method.

Upvotes: 1

Related Questions