Reputation: 95
I have some data that I would like to summarize:
studentid friend Gfriend
214 30401006 0 0
236 30401006 0 0
208 30401006 1 0
229 30401006 0 0
207 30401006 0 0
278 30401007 1 0
250 30401007 1 0
266 30401007 1 0
254 30401007 1 1
277 30401007 1 1
243 30401007 1 1
result should look something like this:
studentid friend Gfriend
30401006 1 0
30401007 6 3
When I try: agg=aggregate(c(friend)~studentid,data=df,FUN=sum)
I get the required result (but only for the friend variable).
But when I try: agg=aggregate(c(friend,Gfriend)~studentid,data=df,FUN=sum)
I get:
Error in model.frame.default(formula = c(friend, Gfriend) ~ studentid, : variable lengths differ (found for 'studentid')
I checked the lengths of the variables ( length(var) ) and they are all the same, plus there are no NA's so I have no idea where this error is coming from.
Why is this happening?
Upvotes: 3
Views: 4334
Reputation: 716
EDIT: added na.rm = T
to address the comment about excluding NAs
Check out the "plyr" package.
library(plyr)
#split by "studentid" and sum all numeric colums
ddply(df, .(studentid), numcolwise(sum, na.rm=T))
studentid friend Gfriend
1 30401006 1 0
2 30401007 6 3
Upvotes: 0
Reputation: 2989
you could also try "by"
studentid < c(30401006,30401006,30401006,30401006,30401006,30401007,
+ 30401007,30401007,30401007,30401007,30401007)
friend <- c(0,0,1,0,0,1,1,1,1,1,1)
Gfriend <- c(0,0,0,0,0,0,0,0,1,1,1)
df <- data.frame(studentid,friend,Gfriend)
df
> result <- by(df[c(2:3)], df$studentid, FUN=colSums)
> result
df$studentid: 30401006
friend Gfriend
1 0
df$studentid: 30401007
friend Gfriend
6 3
Upvotes: 0