Rover Eye
Rover Eye

Reputation: 237

Difference between mean and manual calculation in R?

I am writing a simple function in R to calculate percentage differences between two input numbers.

pdiff <-function(a,b) 
    {
      if(length(a>=1)) a <- median(a)
      if(length(b>=1)) b <- median(b)
      (abs(a-b)/((a+b)/2))*100
    }

    pdiffa <-function(a,b)
    {
      if(length(a>=1)) a <- median(a)
      if(length(b>=1)) b <- median(b)
      (abs(a-b)/mean(a,b))*100
    }

When you run it with a random value of a and b, the functions give different results

x <- 5
y <- 10
pdiff(x,y) #gives 66%
pdiffa(x,y) #gives 100%

enter image description here

When I go into the code, apparently the values of (x+y)/2 = 7.5 and mean(x,y) = 5 differ......Am I missing something really obvious and stupid here?

enter image description here

Upvotes: 5

Views: 1355

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226936

This is due to a nasty "gotcha" in the mean() function (not listed in the list of R traps, but probably should be): you want mean(c(a,b)), not mean(a,b). From ?mean:

mean(x, ...)
[snip snip snip]
... further arguments passed to or from other methods.

So what happens if you call mean(5,10)? mean calls the mean.default method, which has trim as its second argument:

trim the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.

The last phrase "values of trim outside that range are taken as the nearest endpoint" means that values of trim larger than 0.5 are set to 0.5, which means that we're asking mean to throw away 50% of the data on either end of the data set, which means that all that's left is the median. Debugging our way through mean.default, we see that we indeed end up at this code ...

if (trim >= 0.5) 
      return(stats::median(x, na.rm = FALSE))

So mean(c(x,<value_greater_than_0.5>)) returns the median of c(5), which is just 5 ...

Upvotes: 13

neilfws
neilfws

Reputation: 33812

Try mean(5, 10) by itself.

mean(5, 10)
[1] 5

Now try mean(c(5, 10)).

mean(c(5, 10))
[1] 7.5

mean takes a vector as its first argument.

Upvotes: 5

Related Questions