Phaaltu Waaltu
Phaaltu Waaltu

Reputation: 5

Looking to filter and/or tidy up TukeyHSD pairwise comparisons for two-way ANOVA

I'm running a 2-way ANOVA to compare how two variables affect the price of a service. However, when I run post-hoc pairwise comparisons I would like to only see rows that are comparable (i.e. when one of the independent variables matches). Here is an example to show the issue:

Variable 1 - Species: tuna, halibut, salmon, flounder

Variable 2 - Sex: Male, female

Dependent variable: mass

Here is what I'm using for 2-way ANOVA and pairwise comparisons:

> dat <- read.csv(<file location>)
> aov2 <- aov(mass ~ species * sex, data = dat)
> TukeyHSD(aov2, which = "species:sex")

When I run the pairwise comparisons of mass using TukeyHSD, I would only like to see ones such as Tuna:Male - Tuna:Female or Halibut:Male - Flounder:Male, but I would not like to see Flounder:Female - Halibut:Male. Basically, I just want to see comparisons where one of the variables has the same value.

Honestly am not even sure what to try, or if this is even possible. I had some NA values for some of the comparisons and I tried using:

TukeyHSD(aov2, which = "species:sex") %>% filter(is.na(diff))

but I get the following:

Error in UseMethod("filter") : 
no applicable method for 'filter' applied to an object of class "c('TukeyHSD', 'multicomp')"

If this is not possible, is there a way to filter data from a dataframe before it goes into 1-way ANOVA, for example by a certain value in one of the columns? Then I can try to run multiple 1-way ANOVAs to accomplish a similar goal. I would rather not have to create separate data files because the actual dataset I'm working with is very large and would be time-consuming to manually separate.

Upvotes: 0

Views: 274

Answers (1)

Edward
Edward

Reputation: 19349

This is not possible with the current TukeyHSD function. However, with some modifications of this function, you can get the desired output. Below is the modified function, which adds a new argument called "match", the default being FALSE:

TukeyHSD2 <- function (x, which = seq_along(tabs), ordered = FALSE, conf.level = 0.95, 
          match=FALSE, ...) 
{
  mm <- model.tables(x, "means")
  if (is.null(mm$n)) 
    stop("no factors in the fitted model")
  tabs <- mm$tables
  if (names(tabs)[1L] == "Grand mean") 
    tabs <- tabs[-1L]
  tabs <- tabs[which]
  nn <- mm$n[names(tabs)]
  nn_na <- is.na(nn)
  if (all(nn_na)) 
    stop("'which' specified no factors")
  if (any(nn_na)) {
    warning("'which' specified some non-factors which will be dropped")
    tabs <- tabs[!nn_na]
    nn <- nn[!nn_na]
  }
  out <- setNames(vector("list", length(tabs)), names(tabs))
  MSE <- sum(x$residuals^2)/x$df.residual
  for (nm in names(tabs)) {
    tab <- tabs[[nm]]
    means <- as.vector(tab)
    nms <- if (length(dim(tab)) > 1L) {
      dn <- dimnames(tab)
      apply(do.call("expand.grid", dn), 1L, paste, collapse = ":")
    }
    else names(tab)
    n <- nn[[nm]]
    if (length(n) < length(means)) 
      n <- rep.int(n, length(means))
    if (as.logical(ordered)) {
      ord <- order(means)
      means <- means[ord]
      n <- n[ord]
      if (!is.null(nms)) 
        nms <- nms[ord]
    }
    
    center <- outer(means, means, `-`)
    keep <- lower.tri(center)
    
    dnames <- list(outer(nms, nms, paste, sep = "-"), c("diff", "lwr", "upr", "p adj"))
    dk1 <- dnames[[1L]][keep]
    
    if(match) {
      
      keep2 <- sapply(1:length(dk), function(x) {
        pattern <- '(.+):(.+)-(.+):(.+)'
        d1 <- sub(pattern, '\\1', dk1[x])
        d2 <- sub(pattern, '\\2', dk1[x])
        d3 <- sub(pattern, '\\3', dk1[x])
        d4 <- sub(pattern, '\\4', dk1[x])
        d1==d3 | d2==d4 } )
      
      dnames[[1L]] <- dk1[keep2]
    } else dnames[[1L]] <- dk1

    center <- center[keep]
    width <- qtukey(conf.level, length(means), x$df.residual) * 
      sqrt((MSE/2) * outer(1/n, 1/n, `+`))[keep]
    est <- center/(sqrt((MSE/2) * outer(1/n, 1/n, `+`))[keep])
    
    if(match) {
      center <- center[keep2]
      width <- width[keep2]
      est <- est[keep2]
    }
    
    pvals <- ptukey(abs(est), length(means), x$df.residual, 
                    lower.tail = FALSE)
    out[[nm]] <- array(c(center, center - width, center + 
                        width, pvals), c(length(width), 4L), dnames)
  }
  class(out) <- c("TukeyHSD", "multicomp")
  attr(out, "orig.call") <- x$call
  attr(out, "conf.level") <- conf.level
  attr(out, "ordered") <- ordered
  out
}

Send the above function to your R console and then test it on your data. Here is the output using the "warpbreaks" example dataset.

fm1 <- aov(breaks ~ wool * tension, data = warpbreaks)

> TukeyHSD(fm1, which="wool:tension")

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = breaks ~ wool * tension, data = warpbreaks)

$`wool:tension`
               diff       lwr        upr     p adj
B:L-A:L -16.3333333 -31.63966  -1.027012 0.0302143
A:M-A:L -20.5555556 -35.86188  -5.249234 0.0029580
B:M-A:L -15.7777778 -31.08410  -0.471456 0.0398172
A:H-A:L -20.0000000 -35.30632  -4.693678 0.0040955
B:H-A:L -25.7777778 -41.08410 -10.471456 0.0001136
A:M-B:L  -4.2222222 -19.52854  11.084100 0.9626541
B:M-B:L   0.5555556 -14.75077  15.861877 0.9999978
A:H-B:L  -3.6666667 -18.97299  11.639655 0.9797123
B:H-B:L  -9.4444444 -24.75077   5.861877 0.4560950
B:M-A:M   4.7777778 -10.52854  20.084100 0.9377205
A:H-A:M   0.5555556 -14.75077  15.861877 0.9999978
B:H-A:M  -5.2222222 -20.52854  10.084100 0.9114780
A:H-B:M  -4.2222222 -19.52854  11.084100 0.9626541
B:H-B:M -10.0000000 -25.30632   5.306322 0.3918767
B:H-A:H  -5.7777778 -21.08410   9.528544 0.8705572

And here is the output from the modified version with "match" set to TRUE.

> TukeyHSD2(fm1, which="wool:tension", match=TRUE)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = breaks ~ wool * tension, data = warpbreaks)

$`wool:tension`
               diff       lwr       upr     p adj
B:L-A:L -16.3333333 -31.63966 -1.027012 0.0302143
A:M-A:L -20.5555556 -35.86188 -5.249234 0.0029580
A:H-A:L -20.0000000 -35.30632 -4.693678 0.0040955
B:M-B:L   0.5555556 -14.75077 15.861877 0.9999978
B:H-B:L  -9.4444444 -24.75077  5.861877 0.4560950
B:M-A:M   4.7777778 -10.52854 20.084100 0.9377205
A:H-A:M   0.5555556 -14.75077 15.861877 0.9999978
B:H-B:M -10.0000000 -25.30632  5.306322 0.3918767
B:H-A:H  -5.7777778 -21.08410  9.528544 0.8705572

Upvotes: 0

Related Questions