Bqsj Sjbq
Bqsj Sjbq

Reputation: 1281

Why is the aggregate function being applied to the grouping column?

there is a simple aggregate:

dat = read.table(textConnection(
  'ID value
  1 4
  1 7
  2 8
  2 3
  2 3'), header = TRUE)

aggregate(dat,by=list("type"=dat$ID),sum)

i get the result output:

       type ID value
    1    1  2    11
    2    2  6    14

i wonder:
1.in the first row ,why the ID is 2?
2.in the second row ,why the ID is 6?

Upvotes: 0

Views: 145

Answers (1)

Matthew Lundberg
Matthew Lundberg

Reputation: 42639

You requested a sum of each column, aggregated bydat$ID. Using this interface, that will include all columns. dat$ID is simply a vector and thus the ID column is not removed from the aggregated results. The function sum is also applied to ID within each group.

For the first row, you are computing with(dat, sum(ID[dat$ID==1])) or 1+1.
For the second row, you are computing with(dat, sum(ID[dat$ID==2])) or 2+2+2
(It is intentional that I specified dat$ID in each index, rather than ID, as that is what your aggregate call is doing.)

Using the formula interface to aggregate is cleaner, and gives what you seem to want. Using this interface, aggregate gives the sum of the value column, with ID as it appears in each aggregated group:

> aggregate(value ~ ID, data=dat, sum)
  ID value
1  1    11
2  2    14

Upvotes: 2

Related Questions