feature scaling and intercept

Question

it is a beginner question but I have a dataset of two features house sizes and number of bedrooms, So I'm working on Octave; So basically I try to do a feature scaling but as in the Design Matrix I added A column of ones (for tetha0) So I try to do a mean normalisation : (x- mean(x)/std(x) but in the columns of one obviously the mean is 1 since there is only 1 in every line So when I do this the intercept colum is set to 0 :

 mu = mean(X)
mu =

  1.0000   2000.6809      3.1702

 votric = X - mu
votric =

  0.00000    103.31915     -0.17021
  0.00000   -400.68085     -0.17021
  0.00000    399.31915     -0.17021
  0.00000   -584.68085     -1.17021
  0.00000    999.31915      0.82979

So the first column shouldn't be left out of the mean normalization ?

Tasos Papastylianou · Accepted Answer

Yes, you're supposed to normalise the original dataset over all observations first, and only then add the bias term (i.e. the 'column of ones').

The point of normalisation is to allow the various features to be compared on an equal basis, which speeds up optimization algorithms significantly.

The bias (i.e. column of ones) is technically not part of the features. It is just a mathematical convenience to allow us to use a single matrix multiplication to obtain our result in a computationally and notationally efficient manner.

In other words, instead of saying Y = bias + weight1 * X1 + weight2 * X2 etc, you create an imaginary X0 = 1, and denote the bias as weight0, which then allows you to express it in a vectorised fashion as follows: Y = weights * X

"Normalising" the bias term does not make sense, because clearly that would make X0 = 0, the effect of which would be that you would then discard the effect of the bias term altogether. So yes, normalise first, and only then add 'ones' to the normalised features.

PS. I'm going on a limb here and guessing that you're coming from Andrew Ng's machine learning course on coursera. You will see in ex1_multi.m that this is indeed what he's doing in his code (line 52).

% Scale features and set them to zero mean
fprintf('Normalizing Features ...
');

[X mu sigma] = featureNormalize(X);

% Add intercept term to X
X = [ones(m, 1) X];

feature scaling and intercept

Answers (1)

Related Questions