Ilya
Ilya

Reputation: 41

How to change an order of samples in Tukey's test in R?

Problem: I would like to learn how can I change an order of samples for which Tukey's test in R calculates means and assign corresponding letters. Very simple example is below.

I played with iris data and have found that there are differences in Sepal.Length among different Species. Here is the boxplot:

enter image description here

I conducted an ANOVA test and have found that differences are statistically significant.

> fit <- lm(Sepal.Length ~ Species, data = iris)
> summary(aov(fit))

             Df Sum Sq Mean Sq F value Pr(>F)    
Species       2  63.21  31.606   119.3 <2e-16 ***
Residuals   147  38.96   0.265                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Then I conducted Tukey's test and got following:

> library(agricolae)
> HSD.test(fit, "Species", group=T, console=T)

Study: fit ~ "Species"

HSD Test for Sepal.Length 

Mean Square Error:  0.2650082 

Species,  means

           Sepal.Length       std  r Min Max
setosa            5.006 0.3524897 50 4.3 5.8
versicolor        5.936 0.5161711 50 4.9 7.0
virginica         6.588 0.6358796 50 4.9 7.9

alpha: 0.05 ; Df Error: 147 
Critical Value of Studentized Range: 3.348424 

Honestly Significant Difference: 0.2437727 

Means with the same letter are not significantly different.

Groups, Treatments and means
a    virginica       6.588 
b    versicolor      5.936 
c    setosa          5.006

According the table of groups, HSD.test function sorts means into descending order and then assign letters. Thus, "virginica" have the largest mean, therefore it is the first in the table.

Questions: Is there any way to change the default sorting and assigning of letters? Can I sort samples in ascending order of means and then assign letters. An expected output is following:

a setosa     5.006
b versicolor 5.936
c virginica  6.588

Possible solution: In package multcomp there are two functions which can do it working together:

1 - glht do Tukey's test

> an <- aov(fit)
> library(multcomp)
> glht(an, linfct = mcp(Species = "Tukey"))

         General Linear Hypotheses

    Multiple Comparisons of Means: Tukey Contrasts


    Linear Hypotheses:
                                Estimate
    versicolor - setosa == 0       0.930
    virginica - setosa == 0        1.582
    virginica - versicolor == 0    0.652

2 - cld can provide me with letters assigned to Species according levels of the factor iris$Species

> cld(glht(an, linfct = mcp(Species = "Tukey")))
    setosa versicolor  virginica 
       "a"        "b"        "c" 

Unfortunately, glht function do not show another data which can be useful and needed for creating of barplot (means, std, p-values). Of course, I can do it separately with another special functions, or just use both HSD.test and cld. But I would prefer to solve a problem with sorting of means in HSD.test function and use only this one.

Upvotes: 4

Views: 4155

Answers (3)

Pipe Marulanda
Pipe Marulanda

Reputation: 61

It is also possible to solve with multcompLetters() and TukeyHSD(). You should change the parameter "reversed"

library(multcompView)

fit <- aov(Sepal.Length ~ Species, data = iris)

tukey<-TukeyHSD(fit, ordered = T)
tukey_1<-multcompLetters2(Sepal.Length ~ Species,
                          tukey$Species[,"p adj"],
                          iris,reversed = T)
tukey_2<-multcompLetters2(Sepal.Length ~ Species,
                          tukey$Species[,"p adj"],
                          iris,reversed = F)
tukey_1
tukey_2
tapply(iris$Sepal.Length, iris$Species, mean)

Upvotes: 0

Juan Carlos
Juan Carlos

Reputation: 335

First of all, thanks for the function. It was what I was looking for. But I think that there is a mistake in

res <- vector(mode = "character", length=length(trt)), 

it should be

res <- vector(mode = "character", length=length("trt")) 

Upvotes: -1

bluefish
bluefish

Reputation: 405

I notice it's a bit late to answer this question. However I ran into the exact same problem and like to share my solution as future reference. Hope it helps someday someone.

first option

One could use multcompLetters() for example with the results from TukeyHSD(). However, this doesn't allow an arbitrary ordering of the result and is not so easy to use.

second option

Since I needed an arbitrary order I wrote my own function, which takes a vector of letters like returned from HSD.test and swaps the letters in a way the result is nice. Meaning letters first in alphabet appearing first.

library(agricolae)
reorder<-function(inV){
  collapsed <- paste(inV,sep="",collapse = "")
  u <- unique(strsplit(collapsed,"")[[1]])
  if(length(u)<2){
    return(inV)
  }
  u <- u[order(u)]
  m <- matrix(nrow=NROW(inV),ncol=length(u))
  m[]<-F
  for(i in 1:length(inV)){
    s <- strsplit(inV[i],"")[[1]]
    index <- match(s,u)
    m[i,index] <- T
  }
  for(i in 1:(length(u)-1)){
    firstColT <- match(T,m[,i])[1] #first row with true in current column
    firstT <- match(T,rowSums(m[,i:length(u)] > 0))[1] #first row with true in rest
    if(firstT < firstColT){
      colT <- match(T,m[firstT,i:length(u)])[1]
      colT <- colT + i - 1 #correct index for leftout columns in match
      tmp <- m[,colT]
      m[,colT] <- m[,i]
      m[,i] <- tmp
    }
  }
  res <- vector(mode = "character", length=length(trt))
  for(i in 1:length(inV)){
    l <- u[m[i,]]
    res[i] <- paste(l,sep="",collapse = "")
  }
  return(res)
}

fit <- lm(Sepal.Length ~ Species, data = iris)
a <- HSD.test(fit, "Species", group=T, console=F)$groups
a <- a[rev(rownames(a)),] #order the result the way you want
a$M <- reorder(as.character(a$M))

For the example it is a bit of an overkill but it should also work for more complicated cases.

Upvotes: 2

Related Questions