empetrum
empetrum

Reputation: 115

Computing p-values when using pvclust with Bray-Curtis similarity

Using species abundance data recorded for multiple samples, I want to create a dendrogram where branches represent the similarity of samples. The distance measure should be Bray-Curtis-similarity. For each node, I want to obtain p-values indicating stability by bootstrapping. The function pvclust() from the R package pvclust seems to be suitable for this problem. Unfortunately, Bray-Curtis-similarity is not natively implemented as a distance measure in pvclust. However, it is possible to pass a function which returns a dist object to the method.dist argument of pvclust. vegdist from the vegan package seems suitable.

Following the approach from an answer, I found here: Error when using ''ward'' method with pvclust R package, my current approach is the following (using the dune dataset from the package vegan):

# load libraries
library(vegan)
library(pvclust)

# load example data
data(dune)

# run pvclust
pv_result <- pvclust(dune,
                     method.hclust = "ward.D",
                     method.dist = function(x){vegan::vegdist(x, "bray")},
                     n = 1000,
                     parallel = TRUE)

# plot result
plot(pv_result)

This returns the following dendrogram:

enter image description here

The calculation of p-values does not seem to work, since they are 0 for all nodes. I am looking for a way to calculate the p-values. Also approaches using alternatives to pvclust are welcome.

Upvotes: 1

Views: 201

Answers (1)

Jari Oksanen
Jari Oksanen

Reputation: 3712

Function pvclust seems to transpose input data and calculate the clustering for columns (variables) instead of rows (observations). This means that you need to transpose input in your method.dist function as well. Moreover, if you want to have clustering for rows, you must undo the transpose by transposing your input. You should modify your call as:

library(pvclust)
library(vegan)
data(dune)
pvclust(t(dune), # to cluster rows - omit t() to cluster columns
   method.dist = function(x) vegan::vegdist(t(x), "bray"), 
   n = 1000) # and other arguments you need

NB, you need t(x) in the vegdist call in all cases.

Upvotes: 2

Related Questions