Reputation: 71
So I have a table structured like this
id V1 V2
101, 500, 1
101, 600, 1
102, 300, 0
102, 300, 0
102, 400, 0
102, 100, 1
103, 200, 0
103, 400, 0
104, 200, 1
And basically for each id, I want to calculate the mean of V1, and the sum of V2, so the new table should look like this
id V1 V2
101, 550, 2
102, 275, 1
103, 400, 0
104, 200, 1
If anyone can help out I'd really appreciate that.
Upvotes: 2
Views: 521
Reputation: 887831
We can use one of the aggregating functions for this kind of problems. Here, I use dplyr
. We group_by
'id' and summarise
the 'V1' and 'V2' columns with mean
and sum
of those corresponding columns.
library(dplyr)
df1 %>%
group_by(id) %>%
summarise(V1=mean(V1, na.rm=TRUE), V2= sum(V2, na.rm=TRUE))
# id V1 V2
#1 101 550 2
#2 102 275 1
#3 103 300 0
#4 104 200 1
Or another option is data.table
. We convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'id', we get the mean
and sum
of the columns.
library(data.table)
setDT(df1)[, list(V1=mean(V1, na.rm=TRUE), V2= sum(V2, na.rm=TRUE)), by = id]
# id V1 V2
#1: 101 550 2
#2: 102 275 1
#3: 103 300 0
#4: 104 200 1
Or using base R
do.call(rbind, by(df1, df1[1], FUN=function(x)
data.frame(id=x[1,1], V1= mean(x[,2], na.rm=TRUE),
V2=sum(x[,3], na.rm=TRUE))))
Upvotes: 6