Reputation: 909
I'm helping to put together a spatial R lab for a third year class, and one of the tasks will be to identify a specific site that is located the closest (i.e. mean shortest distance) to a set of multiple other sites.
I have a distance matrix dist_m
that I produced by using the gdistance::costDistance
which looks something like this:
# Sample data
m <- matrix(c(2, 1, 8, 5,
7, 6, 3, 4,
9, 3, 2, 8,
1, 3, 7, 4),
nrow = 4,
ncol = 4,
byrow = TRUE)
# Sample distance matrix
dist_m <- dist(m)
dist_m
when printed looks like:
1 2 3
2 8.717798
3 9.899495 5.477226
4 2.645751 7.810250 10.246951
Desired output: From this dist I want to be able to identify the index value (1
, 2
, 3
or 4
) that has the lowest average distance. In this example, it would be index 4
, which has an average distance of 6.90
. Ideally, I'd also like the mean distance returned too (6.90
).
I can find the mean distance of an individual index by doing something like this:
# Convert distance matrix to matrix
m = as.matrix(dist_m)
# Set diagonals and upper triangle to NA
m[upper.tri(m)] = NA
m[m == 0] = NA
# Calculate mean for index
mean(c(m[4,], m[,4]), na.rm = TRUE)
However, I ideally want a solution that either identifies the index with the minimum mean distance directly, rather than having to plug in index values manually (the actual dataset will be much larger than this).
As this is for a university class, I'd like to keep any solution as simple as possible: for-loops and apply functions are likely to be difficult to grasp for students with little experience in R.
Upvotes: 1
Views: 631
Reputation: 10671
If you want to use the tidyverse
this is one way:
as.matrix(dist_m) %>%
as.tibble() %>%
rownames_to_column(var = "start_node") %>%
gather(end_node, dist, -start_node) %>% # go long
filter(dist != 0) %>% # drop identity diagonal
group_by(start_node) %>% # now summarise
summarise(mean_dist = mean(dist)) %>%
filter(mean_dist == min(mean_dist)) # chose minimum mean_dist
# A tibble: 1 x 2
start_node mean_dist
<chr> <dbl>
1 4 6.900984
It's a little long but the pipes make it easy to see what is happening at each line and you get a nice output.
Upvotes: 1
Reputation: 3650
try this:
rMeans <- rowMeans(m, na.rm = T)
names(rMeans) <- NULL
which(rMeans == min(rMeans, na.rm = T))
# [1] 4
Or as a function:
minMeanDist <- function(x) {
m <- as.matrix(x)
m[upper.tri(m)] <- NA
m[m == 0] <- NA
rMeans <- rowMeans(m, na.rm = T)
names(rMeans) <- NULL
mmd <- min(rMeans, na.rm = T)
ind <- which(rMeans == mmd)
list(index = ind, min_mean_dist = mmd)
}
minMeanDist(dist_m)
# $index
# [1] 4
#
# $min_mean_dist
# [1] 6.900984
Upvotes: 1