Reputation: 1412
I have a set of N
points in k
dimensions as a matrix of size N X k
.
How can I find the best fitting line through these points? The line will be a plane (hyerpplane) in k
dimensions. It will have k
coefficients and one bias term.
Existing functions like fit
seem to be usable only for points in 2 or 3 dimension.
Upvotes: 3
Views: 2231
Reputation: 60799
Here's a simpler solution, that just uses MATLAB's \
operator. We start with defining a plane in k
dimensions:
% 0 = a + x(1) * b(1) + x(2) * b(2) + ... + x(k) * 1
k = 8;
a = randn(1);
b = randn(k-1,1);
(note that we assume b(k)=1
, you can always multiply the plane parameters by any value without changing the plane).
Next we generate N
random points within this plane:
N = 1000;
x = rand(N,k-1);
x(:,k) = -(a + x * b);
...sorry, it's not the best way to generate random points on the plane, but it's good enough for the demonstration here. Add noise to the points:
x = x + 0.05*randn(size(x));
To find the parameters of the plane, we solve the system of equations
% a + x(1:k-1) * b == -x(k)
in the least-squares sense. a
and b
are the unknowns there. We can rewrite the left-hand side as [1,x(1:k-1)] * [a;b]
. If we have a matrix equation M*p=v
we can solve for p by writing p=M\v
:
p = [ones(N,1),x(:,1:k-1)]\(-x(:,k));
disp(['ground truth: [a,b,1] = ',mat2str([a,b',1],3)]);
disp(['estimated : [a,b,1] = ',mat2str([p',1],3)]);
This gives as output:
ground truth: [a,b,1] = [-1.35 -1.44 -1.48 1.17 0.226 -0.214 0.234 -1.59 1] estimated : [a,b,1] = [-1.41 -1.38 -1.43 1.14 0.219 -0.195 0.221 -1.54 1]
The less noise or the more points in the dataset, the smaller the error will be of course!
Upvotes: 3
Reputation: 22304
You can fit a hyperplane (or any lower dimensional affine space) to a set of D dimensional data using Principal Component Analysis. Here's an example of fitting a plane to a set of 3D data. This is explained in more detail in the MATLAB documentation but I tried to construct the simplest example I could.
% generate some random correlated data
D = 3;
mu = zeros(1,D);
sqrt_sig = randn(D);
sigma = sqrt_sig'*sqrt_sig;
% generate 50 points in a D x 50 matrix
X = mvnrnd(mu, sigma, 50)';
% perform PCA
coeff = pca(X');
% The last principal component is normal to the best fit plane and plane goes through mean of X
a = coeff(:,D);
b = -mean(X,2)'*a;
% plane defined by a'*x + b = 0
dist = abs(a'*X+b) / norm(a);
mse = mean(dist.^2)
Edit: Added example plot of results for D = 3. I take advantage of the orthogonality of the other principal components here. Ignore the code if you want it's just to demonstrate that the plane does in fact fit the data pretty well.
% plot in 3D
X0 = bsxfun(@minus,X,mean(X,2));
b1 = coeff(:,1); b2 = coeff(:,2);
y1 = b1'*X0; y2 = b2'*X0;
y1_min = min(y1); y1_max = max(y1);
y1_span = y1_max - y1_min;
y2_min = min(y2); y2_max = max(y2);
y2_span = y2_max - y2_min;
pad = 0.2;
y1_min = y1_min - pad*y1_span;
y1_max = y1_max + pad*y1_span;
y2_min = y2_min - pad*y2_span;
y2_max = y2_max + pad*y2_span;
[y1_m,y2_m] = meshgrid(linspace(y1_min,y1_max,5), linspace(y2_min,y2_max,5));
grid = bsxfun(@plus, bsxfun(@times,y1_m(:)',b1) + bsxfun(@times,y2_m(:)',b2), mean(X,2));
x = reshape(grid(1,:),size(y1_m));
y = reshape(grid(2,:),size(y1_m));
z = reshape(grid(3,:),size(y1_m));
figure(1); clf(1);
surf(x,y,z,'FaceColor','black','FaceAlpha',0.3,'EdgeAlpha',0.6);
hold on;
plot3(X(1,:),X(2,:),X(3,:),' .');
axis equal;
axis vis3d;
Edit2: When I say "principal component" I'm being a bit sloppy (or just plain wrong) with the wording. I'm actually referring to the orthogonal basis vectors that the principal components are expressed in.
Upvotes: 5