adn bps
adn bps

Reputation: 639

Understanding the source code of "ave" function

Here is the source code for the "ave" function in R:

function (x, ..., FUN = mean) 
{
    if (missing(...)) 
        x[] <- FUN(x)
    else {
        g <- interaction(...)
        split(x, g) <- lapply(split(x, g), FUN)
    }
    x
}

I am having trouble understanding how the assignment, "split(x, g) <- lapply(split(x, g), FUN)" works. Consider the following example:

# Overview: function inputs and outputs
> x = 10*1:6
> g = c('a', 'b', 'a', 'b', 'a', 'b')
> ave(x, g)
[1] 30 40 30 40 30 40

# Individual components of "split" assignment
> split(x, g)
$a
[1] 10 30 50
$b
[1] 20 40 60
> lapply(split(x, g), mean)
$a
[1] 30
$b
[1] 40

# Examine "x" before and after assignment
> x
[1] 10 20 30 40 50 60
> split(x, g) <- lapply(split(x, g), mean)
> x
[1] 30 40 30 40 30 40

Questions:

• Why does the assignment, "split(x,g) <- lapply(split(x,g), mean)", directly modify x? Does "<-" always modify the first argument of a function, or is there some other rule for this?

• How does this assignment even work? Both the "split" and "lapply" statements have lost the original ordering of x. They are also length 2. How do you end up with a vector of length(x) that matches the original ordering of x?

Upvotes: 1

Views: 532

Answers (1)

Stefan F
Stefan F

Reputation: 2753

This is a tricky one. <- usually does not work in this way. What is actually happening is that you are not calling split(), you are calling a replacement function called split<-(). The documentation of split says

[...] The replacement forms replace values corresponding to such a division. unsplit reverses the effect of split.

See also this answer

Upvotes: 5

Related Questions