jeza
jeza

Reputation: 299

R: results differ when calculating Euclidean distance between two vectors with different methods

Suppose that I have two vectors.

x1 = c(-1, 2, 3)
x2 = c(4, 0, -3)

To calculate the Euclidean distance, I used three different ways

1- The built function norm

s = cbind(x1, x2)
norm(s, "2")
#[1] 5.797896

2- Hand calculation

sqrt(sum(x2 - x1) ^ 2)
#[1] 8.062258

3- custom function

lpnorm <- function(x, p){  
  n <- sum(abs(x) ^ p) ^ (1 / p)
  return(n)
  }

lpnorm(s, 2)
#[1] 6.244998

Why I got different results?

If I am wrong, how to solve this problem?

Upvotes: 2

Views: 1432

Answers (1)

Zheyuan Li
Zheyuan Li

Reputation: 73325

You need s = x2 - x1.

norm(s, "2")
#[1] 8.062258

sqrt(sum(s ^ 2))  ## or: sqrt(c(crossprod(s)))
#[1] 8.062258

lpnorm(s, 2)
#[1] 8.062258

If you define s = cbind(x1, x2), none of the options you listed is going to compute the Euclidean distance between x1 and x2, but we can still get them output the same value. In this case they the L2 norm of the vector c(x1, x2).

norm(s, "F")
#[1] 6.244998

sqrt(sum(s ^ 2))
#[1] 6.244998

lpnorm(s, 2)
#[1] 6.244998

Finally, norm is not a common way for computing distance. It is really for matrix norm. When you do norm(cbind(x1, x2), "2"), it computes the L2 matrix norm which is the largest singular value of matrix cbind(x1, x2).


So my problem is with defining s. Ok, what if I have more than three vectors?

In that case you want pairwise Euclidean matrix. See function ?dist.

I have the train sets (containing three or more rows) and one test set (one row). So, I would like to calculate the Euclidean distance or may be other distances. This is the reason why I want to make sure about the distance calculation.

You want the distance between one vector and each of many others, and the result is a vector?

set.seed(0)
X_train <- matrix(runif(10), 5, 2)
x_test <- runif(2)
S <- t(X_train) - x_test

apply(S, 2, norm, "2")  ## don't try other types than "2"
#[1] 0.8349220 0.7217628 0.8012416 0.6841445 0.9462961

apply(S, 2, lpnorm, 2)
#[1] 0.8349220 0.7217628 0.8012416 0.6841445 0.9462961

sqrt(colSums(S ^ 2))  ## only for L2-norm
#[1] 0.8349220 0.7217628 0.8012416 0.6841445 0.9462961

I would stress again that norm would fail on a vector, unless type = "2". ?norm clearly says that this function is intended for matrix. What norm does is very different from your self-defined lpnorm function. lpnorm is for a vector norm, norm is for a matrix norm. Even "L2" means differently for a matrix and a vector.

Upvotes: 3

Related Questions