Why is the aggregate function being applied to the grouping column?

Question

there is a simple aggregate:

dat = read.table(textConnection(
  'ID value
  1 4
  1 7
  2 8
  2 3
  2 3'), header = TRUE)

aggregate(dat,by=list("type"=dat$ID),sum)

i get the result output:

       type ID value
    1    1  2    11
    2    2  6    14

i wonder:
1.in the first row ,why the ID is 2?
2.in the second row ,why the ID is 6?

Matthew Lundberg · Accepted Answer

You requested a sum of each column, aggregated bydat$ID. Using this interface, that will include all columns. dat$ID is simply a vector and thus the ID column is not removed from the aggregated results. The function sum is also applied to ID within each group.

For the first row, you are computing with(dat, sum(ID[dat$ID==1])) or 1+1.
For the second row, you are computing with(dat, sum(ID[dat$ID==2])) or 2+2+2
(It is intentional that I specified dat$ID in each index, rather than ID, as that is what your aggregate call is doing.)

Using the formula interface to aggregate is cleaner, and gives what you seem to want. Using this interface, aggregate gives the sum of the value column, with ID as it appears in each aggregated group:

> aggregate(value ~ ID, data=dat, sum)
  ID value
1  1    11
2  2    14

Why is the aggregate function being applied to the grouping column?

Answers (1)

Related Questions