Zack
Zack

Reputation: 1235

better way to calculate euclidean distance with R

I am trying to calculate euclidean distance for Iris dataset. Basically I want to calculate distance between each pair of objects. I have a code working as follows:

for (i in 1:iris_column){
  for (j in 1:iris_row) {

    m[i,j] <- sqrt((iris[i,1]-iris[j,1])^2+
                   (iris[i,2]-iris[j,2])^2+
                   (iris[i,3]-iris[j,3])^2+
                   (iris[i,4]-iris[j,4])^2)
  }
}

Although this works, I don't think this is a good way to wring R-style code. I know that R has built-in function to calculate Euclidean function. Without using built-in function, I want to know better code (faster and fewer lines) which could do the same as my code.

Upvotes: 1

Views: 3036

Answers (2)

Quigi
Quigi

Reputation: 314

Or stay with the standard package stats:

m <- dist(iris[,1:4]))

This gives you an object of the class dist, which stores the lower triangle (all you need) compactly. You can get an ordinary full symmetric matrix if, e.g., you like to look at some elements:

> as.matrix(m)[1:5,1:5]
          1         2        3         4         5
1 0.0000000 0.5385165 0.509902 0.6480741 0.1414214
2 0.5385165 0.0000000 0.300000 0.3316625 0.6082763
3 0.5099020 0.3000000 0.000000 0.2449490 0.5099020
4 0.6480741 0.3316625 0.244949 0.0000000 0.6480741
5 0.1414214 0.6082763 0.509902 0.6480741 0.0000000

Upvotes: 0

Konrad Rudolph
Konrad Rudolph

Reputation: 545578

The part inside the loop can be written as

m[i, j] = sqrt(sum((iris[i, ] - iris[j, ]) ^ 2))

I’d keep the nested loop, nothing wrong with that here.

Upvotes: 3

Related Questions