How Does R Calculate the False Discovery Rate

Question

I appear to be getting inconsistent results when I use R's p.adjust function to calculate the False Discovery Rate. Based upon the paper cited in the documentation the adjusted p value should be calculated like this:

adjusted_p_at_index_i= p_at_index_i*(total_number_of_tests/i).

Now when I run p.adjust(c(0.0001, 0.0004, 0.0019),"fdr") I get the expected results of

c(0.0003, 0.0006, 0.0019)

but when I run p.adjust(c(0.517479039, 0.003657195, 0.006080152),"fdr") I get this

c(0.517479039, 0.009120228, 0.009120228)

Instead of the result I calculate:

c(0.517479039, 0.010971585, 0.009120228)

What is R doing to the data that can account for both of these results?

David Robinson · Accepted Answer

The reason is that the FDR calculation ensures that FDR never increases as the p-value decreases. That's because you can always choose to set a higher threshold for your rejection rule if that higher threshold will get you a lower FDR.

In your case, your second hypothesis had a p-value of 0.0006 and an FDR of 0.010971585, but the third hypothesis had a larger p-value and a smaller FDR. If you can achieve an FDR of 0.009120228 by setting your p-value threshold to 0.0019, there is never a reason to set a lower threshold just to get a higher FDR.

You can see this in the code by typing p.adjust:

...
}, BH = {
    i <- lp:1L
    o <- order(p, decreasing = TRUE)
    ro <- order(o)
    pmin(1, cummin(n/i * p[o]))[ro]

The cummin function takes the cumulative minimum of the vector, going backwards in the order of p.

You can see this in the Benjamini-Hochberg paper you link to, including in the definition of the procedure on page 293, which states (emphasis mine):

let k be the largest i for which P(i) <= i / m q*;

then reject all H_(i) i = 1, 2, ..., k

How Does R Calculate the False Discovery Rate

Answers (1)

Related Questions