vectorizing for-loops that use subset()

Question

For each point (x,y) in a data frame, I want to calculate the sum of the euclidean distances from that point to all other points in the data frame that do not have the same 'group' label. Here is a hacky for-loop version of what I'm trying to achieve:

# some fake data
d <- data.frame(group=rep(c('a','b','c'),each=3), x=sample(1:9), y=sample(1:9), z=NA)
for (i in 1:nrow(d)) {
  d2 <- subset(d,group!=d$group[i])
  d$z[i] <- sum(sqrt((d$x[i]-d2$x)^2 + (d$y[i]-d2$y)^2))
}

For example, the desired value for point a1 should be the sum of distances from a1 to each of b1, b2, b3, c1, c2, c3, but NOT including the distances a1-a2 or a1-a3. Is there a vectorized way to accomplish this? I'm sure it's an obvious solution... I've tried various configurations of by() and apply() but can't seem to hit on the answer.

Backlin · Accepted Answer

There is a very nice way to solve this efficiently: precalculate all distances and subset them rather than the points, to avoid repeating the same calculations.

dists <- as.matrix(dist(d[2:3]))
d$z <- sapply(seq(d$group), function(i) sum(dists[i, !d$group %in% d$group[i]]))

vectorizing for-loops that use subset()

Answers (2)

Related Questions