matlab mldivide not giving the same answer as direct pseudo inverse

Question

The problem is very simple. Fitting a quartic curve to some noisy data. Mldivide gives the wrong set of values for the coefficients, while direct computation of pseudo inverse works.

Quartic equation: c0 + c1*t + c2*t^2 + c3*t^3 + c4*t^4

Input file contains t and actual value for each sample.

fid = fopen('Data_Corr.txt');

A = zeros(4001,5);

for i = 1:4001
    dataPt = fscanf(fid,'%f',2);
    A(i,:) = [1 dataPt(1) dataPt(1)^2 dataPt(1)^3 dataPt(1)^4];
    b(i) = dataPt(2);
end

%c = b\A; %using matlab mldivide
c = inv(A'*A)*A'*b; %computing pseudo inverse directly

for i = 1:4001
    d(i) = A(i,1)*c(1) + A(i,2)*c(2) + A(i,3)*c(3) + A(i,4)*c(4) + A(i,5)*c(5);
end

figure; hold on; grid on;
plot(b,'-b');
plot(d,'r-');

A. Donda · Accepted Answer

If I interpret your formula and your code right, A stands in for the powers of t and b contains the data you are trying to model. Using c for the coefficient vector, the equation you want to solve would therefore be

b = A * c

Solving this using mldivide leads to

c = A \ b

while solving it using pinv results in

c = pinv(A) * b

Your code contains a line corresponding to the second of these two equations, in the form c = inv(A'*A)*A'*b. Your use of the backslash operator, however, is the wrong way around.

I strongly suspect that your mismatch comes from this error.

If after correcting it the mismatch still exists, the second possibility is that your linear system is not sufficiently determined by the data. In that case, using pinv you get the solution with the minimum sum of squares of the coefficients, sum(c .^ 2), while mldivide produces a solution with zero coefficients, a "sparse" solution. How exactly this solution is arrived at depends, as @Dmitry pointed out, on the precise properties of the matrices involved. An example:

A = [1 2 0; 0 4 3];
b = [8; 18];
c_mldivide = A \ b
c_pinv = pinv(A) * b

gives the output

c_mldivide =

                     0
                     4
      0.66666666666667


c_pinv =

     0.918032786885245
      3.54098360655738
      1.27868852459016

Because the 3 coefficients of the system are constrained by only 2 data points, one of them can be effectively freely chosen.

In your case I doubt that this is the case because a linear system with 5 coefficients should be well determined by 4001 data points, unless the data are highly redundant.

matlab mldivide not giving the same answer as direct pseudo inverse

Answers (1)

Related Questions