Reputation: 6290
Let's say I have the following two vectors:
x = [(10-1).*rand(7,1) + 1; randi(10,1,1)];
y = [(10-1).*rand(7,1) + 1; randi(10,1,1)];
The first seven elements are continuous values in the range [1,10]. The last element is an integer in the range [1,10].
Now I would like to compute the euclidean distance between x and y. I think the integer element is a problem because all other elements can get very close but the integer element has always spacings of ones. So there is a bias towards the integer element.
How can I calculate something like a normalized euclidean distance on it?
Upvotes: 3
Views: 25069
Reputation: 545
From Euclidean Distance - raw, normalized and double‐scaled coefficients
SYSTAT, Primer 5, and SPSS provide Normalization options for the data so as to permit an investigator to compute a distance coefficient which is essentially “scale free”. Systat 10.2’s normalised Euclidean distance produces its “normalisation” by dividing each squared discrepancy between attributes or persons by the total number of squared discrepancies (or sample size).
Frankly, I can see little point in this standardization – as the final coefficient still remains scale‐sensitive. That is, it is impossible to know whether the value indicates high or low dissimilarity from the coefficient value alone
Upvotes: 1
Reputation: 470
I would rather normalise x and y before calculating the distance and then vanilla Euclidean would suffice.
In your example
x_norm = (x -1) / 9; % normalised x
y_norm = (y -1) / 9; % normalised y
dist = norm(x_norm - y_norm); % Euclidean distance between normalised x, y
However, I am not sure about whether having an integer element contributes to some sort of bias but we have already gotten kind of off-topic for stack overflow :)
Upvotes: 3
Reputation: 5822
According to Wolfram Alpha, and the following answer from cross validated, the normalized Eucledean distance is defined by:
You can calculate it with MATLAB by using:
0.5*(std(x-y)^2) / (std(x)^2+std(y)^2)
Alternatively, you can use:
0.5*((norm((x-mean(x))-(y-mean(y)))^2)/(norm(x-mean(x))^2+norm(y-mean(y))^2))
Upvotes: 10