Reputation: 1753
I want to group by on "col1", "col2" and get the mean of col3
newData
id col1 col2 col3 col4 col5
1 200 2000 150 151 NA
2 200 2000 250 160 "tim"
3 201 2000 300 189 NA
4 201 2000 400 182 NA
I want my output to be
id col1 col2 col3 col4 col5
1 200 2000 200 151 NA
2 201 2000 350 189 NA
aggdata <-aggregate(newData,
by=list(newData$col1,newData$col2),
FUN=mean, na.rm=TRUE)
this gives me the mean of all variables which I do not want.
Upvotes: 0
Views: 599
Reputation: 61154
It doesn't make much sense, but here's an alternative
> cbind(aggregate(col3~col1+col2, data=newData, FUN="mean"),
newData[!duplicated(newData[, "col1"]), c("col4", "col5")])
col1 col2 col3 col4 col5
1 200 2000 200 151 <NA>
3 201 2000 350 189 <NA>
Upvotes: 1
Reputation: 193517
Perhaps you are looking for the merge
of two aggregate
s:
out <- merge(aggregate(col3 ~ col1 + col2, mydf, mean, na.rm = TRUE),
aggregate(cbind(col4, col5) ~ col1 + col2,
mydf, `[`, 1, na.action = na.pass),
by = c("col1", "col2"))
out <- cbind(id = 1:nrow(out), out)
out
# id col1 col2 col3 col4 col5
# 1 1 200 2000 200 151 <NA>
# 2 2 201 2000 350 189 <NA>
The first aggregate
takes the mean
of "col3". The second aggregate
extracts the first element of "col4" and "col5" respectively.
I've manually created an "id" column because in your example, the "id" column in the output doesn't seem to fit any discernible pattern.
Upvotes: 1