Reputation: 56
I've been using Apache math for a while to do a multiple linear regression using OLSMultipleLinearRegression. Now I need to extend my solution to include a weighting factor for each data point.
I'm trying to replicate the MATLAB function fitlm.
I have a MATLAB call like:
table_data = table(points_scored, height, weight, age);
model = fitlm( table_data, 'points_scored ~ -1, height, weight, age', 'Weights', data_weights)
From 'model' I get the regression coefficients for height, weight, age.
In Java the code I have now is (roughly):
double[][] variables = double[grades.length][3];
// Fill in variables for height, weight, age,
...
OLSMultipleLinearRegression regression = new OLSMultipleLinearRegression();
regression.setNoIntercept(true);
regression.newSampleData(points_scored, variables);
There does not appear to be a way to add weightings to OLSMultipleLinearRegression. There does appear to be a way to add weights to the LeastSquaresBuilder. However I'm having trouble figuring out exactly how to use this. My biggest problem (I think) is creating the jacobians that are expected.
Here is most of what I tried:
double[] points_scored = //fill in points scored
double[] height = //fill in
double[] weight = //fill in
double[] age = // fill in
MultivariateJacobianFunction distToResidual= coeffs -> {
RealVector value = new ArrayRealVector(points_scored.length);
RealMatrix jacobian = new Array2DRowRealMatrix(points_scored.length, 3);
for (int i = 0; i < measures.length; ++i) {
double residual = points_scored[i];
residual -= coeffs.getEntry(0) * height[i];
residual -= coeffs.getEntry(1) * weight[i];
residual -= coeffs.getEntry(2) * age[i];
value.setEntry(i, residual);
//No idea how to set up the jacobian here
}
return new Pair<RealVector, RealMatrix>(value, jacobian);
};
double[] prescribedDistancesToLine = new double[measures.length];
Arrays.fill(prescribedDistancesToLine, 0);
double[] starts = new double[] {1, 1, 1};
LeastSquaresProblem problem = new LeastSquaresBuilder().
start(starts).
model(distToResidual).
target(prescribedDistancesToLine).
lazyEvaluation(false).
maxEvaluations(1000).
maxIterations(1000).
build();
LeastSquaresOptimizer.Optimum optimum = new LevenbergMarquardtOptimizer().optimize(problem);
Since I don't know how to make the jacobian values I've just been stabbing in the dark and getting coefficient nowhere near the MATLAB answers. Once I get this part working I know that adding the weights should be a pretty straight forward extra line int the LeastSquaresBuilder.
Thanks for any help in advance!
Upvotes: 2
Views: 1609
Reputation: 133
You can use class GLSMultipleLinearRegression from Apache math. For example, let find linear regression for three plane data points (0, 0), (1, 2), (2, 0) with weights 1, 2, 1:
import org.apache.commons.math3.stat.regression.GLSMultipleLinearRegression;
public class Main {
public static void main(String[] args) {
GLSMultipleLinearRegression regr = new GLSMultipleLinearRegression();
regr.setNoIntercept(false);
double[] y = new double[]{0.0, 2.0, 0.0};
double[][] x = new double[3][];
x[0] = new double[]{0.0};
x[1] = new double[]{1.0};
x[2] = new double[]{2.0};
double[][] omega = new double[3][];
omega[0] = new double[]{1.0, 0.0, 0.0};
omega[1] = new double[]{0.0, 0.5, 0.0};
omega[2] = new double[]{0.0, 0.0, 1.0};
regr.newSampleData(y, x, omega);
double[] params = regr.estimateRegressionParameters();
System.out.println("Slope: " + params[1] + ", intercept: " + params[0]);
}
}
Note that the omega
matrix is diagonal, and its diagonal elements are reciprocal weights.
View the documentation for multi-variable case.
Upvotes: 1