Matlab and Python produces different results for PCA

Question

I am using PCA and I found PCA in sklearn in Python and pca() in Matlab produce different results. Here are the test matrix I am using.

a = np.array([[-1,-1], [-2,-1], [-3, -2], [1,1], [2,1], [3,2]])

For Python sklearn, I got

p = PCA()
print(p.fit_transform(a))

[[-1.38340578  0.2935787 ]
[-2.22189802 -0.25133484]
[-3.6053038   0.04224385]
[ 1.38340578 -0.2935787 ]
[ 2.22189802  0.25133484]
[ 3.6053038  -0.04224385]]

For Matlab, I got

pca(a', 'Centered', false)

[0.2196    0.5340
0.3526   -0.4571
0.5722    0.0768
-0.2196   -0.5340
-0.3526    0.4571
-0.5722   -0.0768]

Why is such difference observed?

Thanks for the answer of Dan. The results look quite reasonable now. However if I test with a random matrix, it seems that Matlab and Python are producing results that are not scalar multiple of each other. Why this happens?

test matrix a:

[[ 0.36671885  0.77268624  0.94687497]
[ 0.75741855  0.63457672  0.88671836]
[ 0.20818031  0.709373    0.45114135]
[ 0.24488718  0.87400025  0.89382836]
[ 0.16554686  0.74684393  0.08551401]
[ 0.07371664  0.1632872   0.84217978]]

Python results:

p = PCA()
p.fit_transform(a))

[[ 0.25305509 -0.10189215 -0.11661895]
[ 0.36137036 -0.20480169  0.27455458]
[-0.25638649 -0.02923213 -0.01619661]
[ 0.14741593 -0.12777308 -0.2434731 ]
[-0.6122582  -0.08568121  0.06790961]
[ 0.10680331  0.54938026  0.03382447]]

Matlab results:

pca(a', 'Centered', false)

0.504156973865138   -0.0808159771243340 -0.107296852182663
0.502756555190181   -0.174432053627297  0.818826939851221
0.329948209311847   0.315668718703861   -0.138813345638127
0.499181592718705   0.0755364557146097  -0.383301081533716
0.232039797509016   0.694464307249012   -0.0436361728092353
0.284905319274925   -0.612706345940607  -0.387190971583757

Thanks for the help of Dan all through this. In fact I found it's a misuse of Matlab functions. Matlab returns principal components coefficients by default. Using [~, score] = pca(a, 'Centered', true) will get the same results as Python.

Dan · Accepted Answer

PCA works off Eigen vectors. So long as the vectors are parallel, the magnitude is irrelevant (just a different normalizaton).

In your case, the two are scalar multiples of each other. Try (in MATLAB)

Python = [-1.38340578  0.2935787 
          -2.22189802 -0.25133484
          3.6053038   0.04224385
          1.38340578 -0.2935787 
          2.22189802  0.25133484
          3.6053038  -0.04224385]

Matlab = [ 0.2196    0.5340
           0.3526   -0.4571
           0.5722    0.0768
          -0.2196   -0.5340
          -0.3526    0.4571
          -0.5722   -0.0768]

Now notice that B(:,1)*-6.2997 is basically equal to A(:,1). Or put another way

A(:,n)./B(:,n)

gives you (roughly) the same number for each row. This means the two vectors have the same direction (i.e. they are just scalar multiples of each other) and so you are getting the same principal components.

See here for another example: https://math.stackexchange.com/a/1183707/118848

Matlab and Python produces different results for PCA

Answers (1)

Related Questions