Reputation: 155
I have a biological data set where I want to calculate the distance between centroids and each centroid represents a given year (so distance is calculated sequentially). I'm exploring usedist::dist_between_centroids()
to calculate the distance in high dimensional space, but it seems quite arduous since the function requires vector inputs of the grouping variables (in this case, year). I've explored vegan::adonis()
as an alternative function, but I can't figure out how to extract the distances. I've attached some sample data using Dune and recoded one of the factors as 'year.' My actual dataset consists of ~20 years worth of data, so manually calculating distances as I've done below is not practical. I think a loop with dist_between_centroids()
might accomplish this task, but I'm not sure how to specify the grouping vectors in the loop.
# Species and environmental data
require(vegan)
require(usedist)
dune <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.spe.txt', row.names = 1)
dune.env <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.env.txt', row.names = 1)
data(dune)
data(dune.env)
all_data <- cbind(dune.env, dune) %>%
arrange(Use)
all_data$Use <- recode_factor(all_data$Use, "Hayfield"="2017")
all_data$Use <- recode_factor(all_data$Use, "Haypastu"="2018")
all_data$Use <- recode_factor(all_data$Use, "Pasture"="2019")
bio_data <- all_data[,6:35]
bio_distmat <- vegdist(bio_data, method = "bray", na.rm=T)
#store distance in matrix
dist_between_mat <- as.data.frame(matrix(ncol=3, nrow=2))
colnames(dist_between_mat) <- c("start_centroid","end_centroid","distance")
dist_between_mat[1,1] = "2017"
dist_between_mat[1,2] = "2018"
dist_between_mat[1,3] = dist_between_centroids(bio_distmat, 1:7,8:15) #distance between 2017 and 2018
dist_between_mat[2,1] = "2018"
dist_between_mat[2,2] = "2019"
dist_between_mat[2,3] = dist_between_centroids(bio_distmat, 8:15,16:20) #distance between 2018 and 2019
Upvotes: 2
Views: 980
Reputation: 3700
You can do this with a simple for-loop. But why write simple code when we can use "tidy" principles instead?
Here is a solution that iterates over the start years and the end years, generates a one-row tibble and then concatenates the rows into a final tibble.
Note that in your reproducible example the years/levels are in reverse chronological order. I use the levels ordering, without casting the levels to years, so make sure that this is the order you intend.
levels(all_data$Use)
#> [1] "2019" "2018" "2017"
n <- nlevels(all_data$Use)
start <- levels(all_data$Use)[1:(n - 1)]
start
#> [1] "2019" "2018"
end <- levels(all_data$Use)[2:n]
end
#> [1] "2018" "2017"
map2_dfr(start, end, ~ {
idx1 <- which(all_data$Use == .x)
idx2 <- which(all_data$Use == .y)
tibble(
start_centroid = .x,
end_centroid = .y,
distance = dist_between_centroids(bio_distmat, idx1, idx2)
)
})
#> # A tibble: 2 × 3
#> start_centroid end_centroid distance
#> <chr> <chr> <dbl>
#> 1 2019 2018 0.210
#> 2 2018 2017 0.327
Created on 2022-07-27 by the reprex package (v2.0.1)
Upvotes: 1
Reputation: 3702
vegan::adonis
(or vegan::adonis2
) does not return that information. vegan::betadisper
does. Its result object contains distances
which are the distances to the respective group centroid, and element group
has the information of the corresponding group. If you want only one group, you must give a constant vector as the group.
Upvotes: 1