Reputation: 323
I have a data frame that looks like:
df<-data.frame(id=c("xx33","xx33","xx22","xx11","xx11","xx00"),amount=c(10,15,100,20,10,15),date=c("01/02/2013","01/02/2013","02/02/2013","03/03/2013","03/03/2013","04/04/2013"))
id amount date
1 xx33 10 01/02/2013
2 xx33 15 01/02/2013
3 xx22 100 02/02/2013
4 xx11 20 03/03/2013
5 xx11 10 03/03/2013
6 xx00 15 04/04/2013
I want to compile all the common IDs and sum the amount and also the number of occurances of the id, but also carry the common information such as date which is the same for each id (along with any other variable). So, I want the output to be:
id sum date number
1 xx33 25 01/02/2013 2
2 xx22 100 02/02/2013 1
3 xx11 30 03/03/2013 2
4 xx00 15 04/04/2013 1
I've tried
ddply(.data = df, .var = "id", .fun = nrow)
and that returns the total number of occurances but I can't figure out a way to sum the all the common ids without looping.
Upvotes: 10
Views: 26796
Reputation: 44614
an obligatory base R answer:
unique(transform(df, amount=ave(amount, id, FUN=sum),
count=ave(amount, id, FUN=length)))
# id amount date count
# 1 xx33 25 01/02/2013 2
# 3 xx22 100 02/02/2013 1
# 4 xx11 30 03/03/2013 2
# 6 xx00 15 04/04/2013 1
Upvotes: 4
Reputation: 61214
Here's an R base solution
> cbind(aggregate(amount~id+date, sum, data=df), table(df$id))[, -4]
id date amount Freq
1 xx33 01/02/2013 25 1
2 xx22 02/02/2013 100 2
3 xx11 03/03/2013 30 1
4 xx00 04/04/2013 15 2
Upvotes: 6
Reputation: 15458
Here is the solution using plyr package:
library(plyr)
ddply(df,.(date,id),summarize,sum=sum(amount),number=length(id))
date id sum number
1 01/02/2013 xx33 25 2
2 02/02/2013 xx22 100 1
3 03/03/2013 xx11 30 2
4 04/04/2013 xx00 15 1
Upvotes: 9
Reputation: 12905
Using the data.table
library -
library(data.table)
dt <- data.table(df)
dt2 <- dt[,list(sumamount = sum(amount), freq = .N), by = c("id","date")]
Output:
> dt2
id date sumamount freq
1: xx33 01/02/2013 25 2
2: xx22 02/02/2013 100 1
3: xx11 03/03/2013 30 2
4: xx00 04/04/2013 15 1
Upvotes: 9