Reputation: 127
how do I make a calculation based on the result of a previous line in R? To make it more clearly, just imagine the following dataframe:
user rev total_rev
A 10 10
A 10 20
A 20 40
A 10 50
B 50 50
C 50 50
C 10 60
C 20 80
where user is an unique ID variable and rev a metric variable (for example revenue), which I want to aggregate into the new variable "total_rev". It should contain the sum of the variable "rev" up to the specific line, thus someting like the following calculation has to be conducted for each line:
> total_rev[i] = total_rev[i-1] + rev[i]
where i is the actual line
Note that the calculation has to start from zero for each user. I've already tried to solve this with a loop, which worked for a small testcase, but the dateframe is quite huge and the calculation on the complete data set just didn't want to end.
Upvotes: 3
Views: 82
Reputation: 61164
When dealing with huge database, data.table
is a good option
> library(data.table)
> DT <- data.table(df)
> DT[, total:= cumsum(rev), by=list(user) ]
> DT
user rev total_rev total
1: A 10 10 10
2: A 10 20 20
3: A 20 40 40
4: A 10 50 50
5: B 50 50 50
6: C 50 50 50
7: C 10 60 60
8: C 20 80 80
Upvotes: 3
Reputation: 25736
You could use ?ave
and ?cumsum
:
ave(df$rev, df$user, cumsum)
E.g.:
df <- read.table(textConnection("user rev total_rev
A 10 10
A 10 20
A 20 40
A 10 50
B 50 50
C 50 50
C 10 60
C 20 80"), header=TRUE)
df$total <- ave(df$rev, df$user, cumsum)
# user rev total_rev total
#1 A 10 10 10
#2 A 10 20 20
#3 A 20 40 40
#4 A 10 50 50
#5 B 50 50 50
#6 C 50 50 50
#7 C 10 60 60
#8 C 20 80 80
Upvotes: 2
Reputation: 15458
library(plyr)
mydata<-mtcars
ddply(mydata,.(cyl),transform,mpg=cumsum(mpg))
For your data:
library(plyr)
ddply(yourdata,.(user),transform,total_rev=cumsum(rev))
user rev total_rev
1 A 10 10
2 A 10 20
3 A 20 40
4 A 10 50
5 B 50 50
6 C 50 50
7 C 10 60
8 C 20 80
Upvotes: 0