Reputation: 317
I'm trying to perform a partial least square regression analysis in C#. The pls technique performed in MATLAB uses SIMPLS algorithm which provides the beta (matrix of regression coefficients).
I do not understand why the matrices are different in both cases, is there some mistake in the way I pass input to the C# version?
Also, the inputs are same for both and are in reference to the paper that is included here.
Minimal working example:
MATLAB: following the small example by Hervé Abdi (Hervé Abdi, Partial Least Square Regression). References: PDF
clear all;
clc;
inputs = [7, 7, 13, 7; 4, 3, 14, 7; 10, 5, 12, 5; 16, 7, 11, 3; 13, 3, 10, 3];
outputs = [14, 7, 8; 10, 7, 6; 8, 5, 5; 2, 4,7; 6, 2, 4];
[XL,yl,XS,YS,beta,PCTVAR] = plsregress(inputs,outputs, 1);
disp 'beta'
beta
disp 'beta size'
size(beta)
yfit = [ones(size(inputs,1),1) inputs]*beta;
residuals = outputs - yfit;
% stem(residuals)
% xlabel('Observation');
% ylabel('Residual');
beta =
1.0484e+01 6.1899e+00 6.2841e+00
-6.3488e-01 -3.0405e-01 -7.2608e-02
2.1949e-02 1.0512e-02 2.5102e-03
1.9226e-01 9.2078e-02 2.1988e-02
2.8948e-01 1.3864e-01 3.3107e-02
Accord.NET:
double[][] inputs = new double[][]
{
// Wine | Price | Sugar | Alcohol | Acidity
new double[] { 7, 7, 13, 7 },
new double[] { 4, 3, 14, 7 },
new double[] { 10, 5, 12, 5 },
new double[] { 16, 7, 11, 3 },
new double[] { 13, 3, 10, 3 },
};
double[][] outputs = new double[][]
{
// Wine | Hedonic | Goes with meat | Goes with dessert
new double[] { 14, 7, 8 },
new double[] { 10, 7, 6 },
new double[] { 8, 5, 5 },
new double[] { 2, 4, 7 },
new double[] { 6, 2, 4 },
};
var pls = new PartialLeastSquaresAnalysis()
{
Method = AnalysisMethod.Center,
Algorithm = PartialLeastSquaresAlgorithm.NIPALS
};
var regression = pls.Learn(inputs, outputs);
double[][] coeffs = regression.Weights;
>>
-1.69811320754717 -0.0566037735849056 0.0707547169811322
1.27358490566038 0.29245283018868 0.571933962264151
-4 1 0.5
1.17924528301887 0.122641509433962 0.159198113207547
Upvotes: 1
Views: 1051
Reputation: 2118
I think there are at least three discrepancies between the way the MATLAB and Accord.NET versions of PLS are being called.
As you mention, MATLAB is using SIMPLS. However, Accord.NET is being told to use NIPALS.
The MATLAB version is being called as plsregress(inputs, outputs, 1), meaning the regression is being computed considering only 1 latent component in PLS, but you Accord.NET has not been instructed to do the same.
Accord.NET returns a MultivariateLinearRegression object that contains both a matrix of weights as well as a vector of intercepts, whereas and MATLAB returns the intercepts as the first column of the weights matrix.
Once all these are taken in consideration, it is possible to generate exact the same results as the MATLAB version:
double[][] inputs = new double[][]
{
// Wine | Price | Sugar | Alcohol | Acidity
new double[] { 7, 7, 13, 7 },
new double[] { 4, 3, 14, 7 },
new double[] { 10, 5, 12, 5 },
new double[] { 16, 7, 11, 3 },
new double[] { 13, 3, 10, 3 },
};
double[][] outputs = new double[][]
{
// Wine | Hedonic | Goes with meat | Goes with dessert
new double[] { 14, 7, 8 },
new double[] { 10, 7, 6 },
new double[] { 8, 5, 5 },
new double[] { 2, 4, 7 },
new double[] { 6, 2, 4 },
};
// Create the Partial Least Squares Analysis
var pls = new PartialLeastSquaresAnalysis()
{
Method = AnalysisMethod.Center,
Algorithm = PartialLeastSquaresAlgorithm.SIMPLS, // First change: use SIMPLS
};
// Learn the analysis
pls.Learn(inputs, outputs);
// Second change: Use just 1 latent factor/component
var regression = pls.CreateRegression(factors: 1);
// Third change: present results as in MATLAB
double[][] w = regression.Weights.Transpose();
double[] b = regression.Intercepts;
// Add the intercepts as the first column of the matrix of
// weights and transpose it as in the way MATLAB presents it
double[][] coeffs = (w.InsertColumn(b, index: 0)).Transpose();
// Show results in MATLAB format
string str = coeffs.ToOctave();
With those changes, the coeffs matrix above should become
[ 10.4844779770616 6.18986077674717 6.28413863347486 ;
-0.634878923091644 -0.304054829845448 -0.0726082626993539 ;
0.0219492754418065 0.0105118991463605 0.00251024045589416 ;
0.192261724966225 0.0920775662006966 0.0219881135215502 ;
0.289484835410222 0.13863944631343 0.033107085796122 ]
Upvotes: 2