Reputation: 153
I am using PCA and I found PCA in sklearn in Python and pca() in Matlab produce different results. Here are the test matrix I am using.
a = np.array([[-1,-1], [-2,-1], [-3, -2], [1,1], [2,1], [3,2]])
For Python sklearn, I got
p = PCA()
print(p.fit_transform(a))
[[-1.38340578 0.2935787 ]
[-2.22189802 -0.25133484]
[-3.6053038 0.04224385]
[ 1.38340578 -0.2935787 ]
[ 2.22189802 0.25133484]
[ 3.6053038 -0.04224385]]
For Matlab, I got
pca(a', 'Centered', false)
[0.2196 0.5340
0.3526 -0.4571
0.5722 0.0768
-0.2196 -0.5340
-0.3526 0.4571
-0.5722 -0.0768]
Why is such difference observed?
Thanks for the answer of Dan. The results look quite reasonable now. However if I test with a random matrix, it seems that Matlab and Python are producing results that are not scalar multiple of each other. Why this happens?
test matrix a:
[[ 0.36671885 0.77268624 0.94687497]
[ 0.75741855 0.63457672 0.88671836]
[ 0.20818031 0.709373 0.45114135]
[ 0.24488718 0.87400025 0.89382836]
[ 0.16554686 0.74684393 0.08551401]
[ 0.07371664 0.1632872 0.84217978]]
Python results:
p = PCA()
p.fit_transform(a))
[[ 0.25305509 -0.10189215 -0.11661895]
[ 0.36137036 -0.20480169 0.27455458]
[-0.25638649 -0.02923213 -0.01619661]
[ 0.14741593 -0.12777308 -0.2434731 ]
[-0.6122582 -0.08568121 0.06790961]
[ 0.10680331 0.54938026 0.03382447]]
Matlab results:
pca(a', 'Centered', false)
0.504156973865138 -0.0808159771243340 -0.107296852182663
0.502756555190181 -0.174432053627297 0.818826939851221
0.329948209311847 0.315668718703861 -0.138813345638127
0.499181592718705 0.0755364557146097 -0.383301081533716
0.232039797509016 0.694464307249012 -0.0436361728092353
0.284905319274925 -0.612706345940607 -0.387190971583757
Thanks for the help of Dan all through this. In fact I found it's a misuse of Matlab functions. Matlab returns principal components coefficients by default. Using [~, score] = pca(a, 'Centered', true) will get the same results as Python.
Upvotes: 0
Views: 2227
Reputation: 45752
PCA works off Eigen vectors. So long as the vectors are parallel, the magnitude is irrelevant (just a different normalizaton).
In your case, the two are scalar multiples of each other. Try (in MATLAB)
Python = [-1.38340578 0.2935787
-2.22189802 -0.25133484
3.6053038 0.04224385
1.38340578 -0.2935787
2.22189802 0.25133484
3.6053038 -0.04224385]
Matlab = [ 0.2196 0.5340
0.3526 -0.4571
0.5722 0.0768
-0.2196 -0.5340
-0.3526 0.4571
-0.5722 -0.0768]
Now notice that B(:,1)*-6.2997
is basically equal to A(:,1)
. Or put another way
A(:,n)./B(:,n)
gives you (roughly) the same number for each row. This means the two vectors have the same direction (i.e. they are just scalar multiples of each other) and so you are getting the same principal components.
See here for another example: https://math.stackexchange.com/a/1183707/118848
Upvotes: 7