user1946217
user1946217

Reputation: 1753

Grouping by two attributes and calculating the mean of another attribute in R

I want to group by on "col1", "col2" and get the mean of col3

newData
id    col1   col2    col3   col4   col5
1     200    2000    150     151    NA
2     200    2000    250     160   "tim"
3     201    2000    300     189    NA
4     201    2000    400     182    NA

I want my output to be

id    col1   col2    col3   col4    col5
1     200    2000    200     151     NA    
2     201    2000    350     189     NA

aggdata <-aggregate(newData, 
                by=list(newData$col1,newData$col2), 
                FUN=mean, na.rm=TRUE)

this gives me the mean of all variables which I do not want.

Upvotes: 0

Views: 599

Answers (2)

Jilber Urbina
Jilber Urbina

Reputation: 61154

It doesn't make much sense, but here's an alternative

> cbind(aggregate(col3~col1+col2, data=newData, FUN="mean"),
        newData[!duplicated(newData[, "col1"]), c("col4", "col5")])
  col1 col2 col3 col4 col5
1  200 2000  200  151 <NA>
3  201 2000  350  189 <NA>

Upvotes: 1

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

Perhaps you are looking for the merge of two aggregates:

out <- merge(aggregate(col3 ~ col1 + col2, mydf, mean, na.rm = TRUE),
             aggregate(cbind(col4, col5) ~ col1 + col2, 
                       mydf, `[`, 1, na.action = na.pass),
             by = c("col1", "col2"))
out <- cbind(id = 1:nrow(out), out)
out
#   id col1 col2 col3 col4 col5
# 1  1  200 2000  200  151 <NA>
# 2  2  201 2000  350  189 <NA>

The first aggregate takes the mean of "col3". The second aggregate extracts the first element of "col4" and "col5" respectively.

I've manually created an "id" column because in your example, the "id" column in the output doesn't seem to fit any discernible pattern.

Upvotes: 1

Related Questions