Looking to filter and/or tidy up TukeyHSD pairwise comparisons for two-way ANOVA

Question

I'm running a 2-way ANOVA to compare how two variables affect the price of a service. However, when I run post-hoc pairwise comparisons I would like to only see rows that are comparable (i.e. when one of the independent variables matches). Here is an example to show the issue:

Variable 1 - Species: tuna, halibut, salmon, flounder

Variable 2 - Sex: Male, female

Dependent variable: mass

Here is what I'm using for 2-way ANOVA and pairwise comparisons:

> dat <- read.csv()
> aov2 <- aov(mass ~ species * sex, data = dat)
> TukeyHSD(aov2, which = "species:sex")

When I run the pairwise comparisons of mass using TukeyHSD, I would only like to see ones such as Tuna:Male - Tuna:Female or Halibut:Male - Flounder:Male, but I would not like to see Flounder:Female - Halibut:Male. Basically, I just want to see comparisons where one of the variables has the same value.

Honestly am not even sure what to try, or if this is even possible. I had some NA values for some of the comparisons and I tried using:

TukeyHSD(aov2, which = "species:sex") %>% filter(is.na(diff))

but I get the following:

Error in UseMethod("filter") : 
no applicable method for 'filter' applied to an object of class "c('TukeyHSD', 'multicomp')"

If this is not possible, is there a way to filter data from a dataframe before it goes into 1-way ANOVA, for example by a certain value in one of the columns? Then I can try to run multiple 1-way ANOVAs to accomplish a similar goal. I would rather not have to create separate data files because the actual dataset I'm working with is very large and would be time-consuming to manually separate.

Edward · Accepted Answer

This is not possible with the current TukeyHSD function. However, with some modifications of this function, you can get the desired output. Below is the modified function, which adds a new argument called "match", the default being FALSE:

TukeyHSD2 <- function (x, which = seq_along(tabs), ordered = FALSE, conf.level = 0.95, 
          match=FALSE, ...) 
{
  mm <- model.tables(x, "means")
  if (is.null(mm$n)) 
    stop("no factors in the fitted model")
  tabs <- mm$tables
  if (names(tabs)[1L] == "Grand mean") 
    tabs <- tabs[-1L]
  tabs <- tabs[which]
  nn <- mm$n[names(tabs)]
  nn_na <- is.na(nn)
  if (all(nn_na)) 
    stop("'which' specified no factors")
  if (any(nn_na)) {
    warning("'which' specified some non-factors which will be dropped")
    tabs <- tabs[!nn_na]
    nn <- nn[!nn_na]
  }
  out <- setNames(vector("list", length(tabs)), names(tabs))
  MSE <- sum(x$residuals^2)/x$df.residual
  for (nm in names(tabs)) {
    tab <- tabs[[nm]]
    means <- as.vector(tab)
    nms <- if (length(dim(tab)) > 1L) {
      dn <- dimnames(tab)
      apply(do.call("expand.grid", dn), 1L, paste, collapse = ":")
    }
    else names(tab)
    n <- nn[[nm]]
    if (length(n) < length(means)) 
      n <- rep.int(n, length(means))
    if (as.logical(ordered)) {
      ord <- order(means)
      means <- means[ord]
      n <- n[ord]
      if (!is.null(nms)) 
        nms <- nms[ord]
    }
    
    center <- outer(means, means, `-`)
    keep <- lower.tri(center)
    
    dnames <- list(outer(nms, nms, paste, sep = "-"), c("diff", "lwr", "upr", "p adj"))
    dk1 <- dnames[[1L]][keep]
    
    if(match) {
      
      keep2 <- sapply(1:length(dk), function(x) {
        pattern <- '(.+):(.+)-(.+):(.+)'
        d1 <- sub(pattern, '\1', dk1[x])
        d2 <- sub(pattern, '\2', dk1[x])
        d3 <- sub(pattern, '\3', dk1[x])
        d4 <- sub(pattern, '\4', dk1[x])
        d1==d3 | d2==d4 } )
      
      dnames[[1L]] <- dk1[keep2]
    } else dnames[[1L]] <- dk1

    center <- center[keep]
    width <- qtukey(conf.level, length(means), x$df.residual) * 
      sqrt((MSE/2) * outer(1/n, 1/n, `+`))[keep]
    est <- center/(sqrt((MSE/2) * outer(1/n, 1/n, `+`))[keep])
    
    if(match) {
      center <- center[keep2]
      width <- width[keep2]
      est <- est[keep2]
    }
    
    pvals <- ptukey(abs(est), length(means), x$df.residual, 
                    lower.tail = FALSE)
    out[[nm]] <- array(c(center, center - width, center + 
                        width, pvals), c(length(width), 4L), dnames)
  }
  class(out) <- c("TukeyHSD", "multicomp")
  attr(out, "orig.call") <- x$call
  attr(out, "conf.level") <- conf.level
  attr(out, "ordered") <- ordered
  out
}

Send the above function to your R console and then test it on your data. Here is the output using the "warpbreaks" example dataset.

fm1 <- aov(breaks ~ wool * tension, data = warpbreaks)

> TukeyHSD(fm1, which="wool:tension")

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = breaks ~ wool * tension, data = warpbreaks)

$`wool:tension`
               diff       lwr        upr     p adj
B:L-A:L -16.3333333 -31.63966  -1.027012 0.0302143
A:M-A:L -20.5555556 -35.86188  -5.249234 0.0029580
B:M-A:L -15.7777778 -31.08410  -0.471456 0.0398172
A:H-A:L -20.0000000 -35.30632  -4.693678 0.0040955
B:H-A:L -25.7777778 -41.08410 -10.471456 0.0001136
A:M-B:L  -4.2222222 -19.52854  11.084100 0.9626541
B:M-B:L   0.5555556 -14.75077  15.861877 0.9999978
A:H-B:L  -3.6666667 -18.97299  11.639655 0.9797123
B:H-B:L  -9.4444444 -24.75077   5.861877 0.4560950
B:M-A:M   4.7777778 -10.52854  20.084100 0.9377205
A:H-A:M   0.5555556 -14.75077  15.861877 0.9999978
B:H-A:M  -5.2222222 -20.52854  10.084100 0.9114780
A:H-B:M  -4.2222222 -19.52854  11.084100 0.9626541
B:H-B:M -10.0000000 -25.30632   5.306322 0.3918767
B:H-A:H  -5.7777778 -21.08410   9.528544 0.8705572

And here is the output from the modified version with "match" set to TRUE.

> TukeyHSD2(fm1, which="wool:tension", match=TRUE)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = breaks ~ wool * tension, data = warpbreaks)

$`wool:tension`
               diff       lwr       upr     p adj
B:L-A:L -16.3333333 -31.63966 -1.027012 0.0302143
A:M-A:L -20.5555556 -35.86188 -5.249234 0.0029580
A:H-A:L -20.0000000 -35.30632 -4.693678 0.0040955
B:M-B:L   0.5555556 -14.75077 15.861877 0.9999978
B:H-B:L  -9.4444444 -24.75077  5.861877 0.4560950
B:M-A:M   4.7777778 -10.52854 20.084100 0.9377205
A:H-A:M   0.5555556 -14.75077 15.861877 0.9999978
B:H-B:M -10.0000000 -25.30632  5.306322 0.3918767
B:H-A:H  -5.7777778 -21.08410  9.528544 0.8705572

Looking to filter and/or tidy up TukeyHSD pairwise comparisons for two-way ANOVA

Answers (1)

Related Questions