user3451228
user3451228

Reputation: 153

Matlab and Python produces different results for PCA

I am using PCA and I found PCA in sklearn in Python and pca() in Matlab produce different results. Here are the test matrix I am using.

a = np.array([[-1,-1], [-2,-1], [-3, -2], [1,1], [2,1], [3,2]])

For Python sklearn, I got

p = PCA()
print(p.fit_transform(a))

[[-1.38340578  0.2935787 ]
[-2.22189802 -0.25133484]
[-3.6053038   0.04224385]
[ 1.38340578 -0.2935787 ]
[ 2.22189802  0.25133484]
[ 3.6053038  -0.04224385]]

For Matlab, I got

pca(a', 'Centered', false)

[0.2196    0.5340
0.3526   -0.4571
0.5722    0.0768
-0.2196   -0.5340
-0.3526    0.4571
-0.5722   -0.0768]

Why is such difference observed?


Thanks for the answer of Dan. The results look quite reasonable now. However if I test with a random matrix, it seems that Matlab and Python are producing results that are not scalar multiple of each other. Why this happens?

test matrix a:

[[ 0.36671885  0.77268624  0.94687497]
[ 0.75741855  0.63457672  0.88671836]
[ 0.20818031  0.709373    0.45114135]
[ 0.24488718  0.87400025  0.89382836]
[ 0.16554686  0.74684393  0.08551401]
[ 0.07371664  0.1632872   0.84217978]]

Python results:

p = PCA()
p.fit_transform(a))

[[ 0.25305509 -0.10189215 -0.11661895]
[ 0.36137036 -0.20480169  0.27455458]
[-0.25638649 -0.02923213 -0.01619661]
[ 0.14741593 -0.12777308 -0.2434731 ]
[-0.6122582  -0.08568121  0.06790961]
[ 0.10680331  0.54938026  0.03382447]]

Matlab results:

pca(a', 'Centered', false)

0.504156973865138   -0.0808159771243340 -0.107296852182663
0.502756555190181   -0.174432053627297  0.818826939851221
0.329948209311847   0.315668718703861   -0.138813345638127
0.499181592718705   0.0755364557146097  -0.383301081533716
0.232039797509016   0.694464307249012   -0.0436361728092353
0.284905319274925   -0.612706345940607  -0.387190971583757

Thanks for the help of Dan all through this. In fact I found it's a misuse of Matlab functions. Matlab returns principal components coefficients by default. Using [~, score] = pca(a, 'Centered', true) will get the same results as Python.

Upvotes: 0

Views: 2227

Answers (1)

Dan
Dan

Reputation: 45752

PCA works off Eigen vectors. So long as the vectors are parallel, the magnitude is irrelevant (just a different normalizaton).

In your case, the two are scalar multiples of each other. Try (in MATLAB)

Python = [-1.38340578  0.2935787 
          -2.22189802 -0.25133484
          3.6053038   0.04224385
          1.38340578 -0.2935787 
          2.22189802  0.25133484
          3.6053038  -0.04224385]

Matlab = [ 0.2196    0.5340
           0.3526   -0.4571
           0.5722    0.0768
          -0.2196   -0.5340
          -0.3526    0.4571
          -0.5722   -0.0768]

Now notice that B(:,1)*-6.2997 is basically equal to A(:,1). Or put another way

A(:,n)./B(:,n)

gives you (roughly) the same number for each row. This means the two vectors have the same direction (i.e. they are just scalar multiples of each other) and so you are getting the same principal components.

See here for another example: https://math.stackexchange.com/a/1183707/118848

Upvotes: 7

Related Questions