Joshua Smith
Joshua Smith

Reputation: 155

R extract distance between centroids to data frame using Vegan

I have a biological data set where I want to calculate the distance between centroids and each centroid represents a given year (so distance is calculated sequentially). I'm exploring usedist::dist_between_centroids() to calculate the distance in high dimensional space, but it seems quite arduous since the function requires vector inputs of the grouping variables (in this case, year). I've explored vegan::adonis() as an alternative function, but I can't figure out how to extract the distances. I've attached some sample data using Dune and recoded one of the factors as 'year.' My actual dataset consists of ~20 years worth of data, so manually calculating distances as I've done below is not practical. I think a loop with dist_between_centroids() might accomplish this task, but I'm not sure how to specify the grouping vectors in the loop.


# Species and environmental data
require(vegan)
require(usedist)

dune <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.spe.txt', row.names = 1)

dune.env <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.env.txt', row.names = 1)

data(dune) 
data(dune.env)

all_data <- cbind(dune.env, dune) %>%
              arrange(Use)

all_data$Use <- recode_factor(all_data$Use, "Hayfield"="2017")
all_data$Use <- recode_factor(all_data$Use, "Haypastu"="2018")
all_data$Use <- recode_factor(all_data$Use, "Pasture"="2019")


bio_data <- all_data[,6:35] 

bio_distmat <- vegdist(bio_data, method = "bray", na.rm=T) 


#store distance in matrix
dist_between_mat <- as.data.frame(matrix(ncol=3, nrow=2))
colnames(dist_between_mat) <- c("start_centroid","end_centroid","distance")

dist_between_mat[1,1] = "2017"
dist_between_mat[1,2] = "2018"
dist_between_mat[1,3] = dist_between_centroids(bio_distmat, 1:7,8:15) #distance between 2017 and 2018

dist_between_mat[2,1] = "2018"
dist_between_mat[2,2] = "2019"
dist_between_mat[2,3] = dist_between_centroids(bio_distmat, 8:15,16:20) #distance between 2018 and 2019


Upvotes: 2

Views: 980

Answers (2)

dipetkov
dipetkov

Reputation: 3700

You can do this with a simple for-loop. But why write simple code when we can use "tidy" principles instead?

Here is a solution that iterates over the start years and the end years, generates a one-row tibble and then concatenates the rows into a final tibble.

Note that in your reproducible example the years/levels are in reverse chronological order. I use the levels ordering, without casting the levels to years, so make sure that this is the order you intend.

levels(all_data$Use)
#> [1] "2019" "2018" "2017"

n <- nlevels(all_data$Use)

start <- levels(all_data$Use)[1:(n - 1)]
start
#> [1] "2019" "2018"
end <- levels(all_data$Use)[2:n]
end
#> [1] "2018" "2017"

map2_dfr(start, end, ~ {
  idx1 <- which(all_data$Use == .x)
  idx2 <- which(all_data$Use == .y)
  tibble(
    start_centroid = .x,
    end_centroid = .y,
    distance = dist_between_centroids(bio_distmat, idx1, idx2)
  )
})
#> # A tibble: 2 × 3
#>   start_centroid end_centroid distance
#>   <chr>          <chr>           <dbl>
#> 1 2019           2018            0.210
#> 2 2018           2017            0.327

Created on 2022-07-27 by the reprex package (v2.0.1)

Upvotes: 1

Jari Oksanen
Jari Oksanen

Reputation: 3702

vegan::adonis (or vegan::adonis2) does not return that information. vegan::betadisper does. Its result object contains distances which are the distances to the respective group centroid, and element group has the information of the corresponding group. If you want only one group, you must give a constant vector as the group.

Upvotes: 1

Related Questions