Bushra Naseem
Bushra Naseem

Reputation: 39

Calculating Euclidean Distance for Large DataSets

I have to calculate Euclidean distance between train and test data. the total length of train data is 1389 and for test data is 364. It is basically the data from the handwritten ZIP codes on envelopes from U.S. postal mail, downloaded from the website of "Elements of Statistical learning".

I am a beginner and just read the data in R package. I'm unable to start calculating distance between train and test data. Can anyone help me out to give me an idea that how to generate a loop for this data?

I would be thankful.

Upvotes: 3

Views: 3931

Answers (1)

flodel
flodel

Reputation: 89057

For Euclidian distances, I like using rdist from the fields packages. One advantage over dist from the stats package, is that it can take two matrices as input:

train.data <- matrix(runif(1389*2), ncol = 2)
test.data  <- matrix(runif(364*2),  ncol = 2)

library(fields)
distances <- rdist(train.data, test.data)
dim(distances)
# [1] 1389  364

Upvotes: 6

Related Questions