user2635656
user2635656

Reputation: 127

Some specific calculation in a data.frame

how do I make a calculation based on the result of a previous line in R? To make it more clearly, just imagine the following dataframe:

user   rev   total_rev
A      10    10   
A      10    20 
A      20    40
A      10    50
B      50    50  
C      50    50   
C      10    60
C      20    80

where user is an unique ID variable and rev a metric variable (for example revenue), which I want to aggregate into the new variable "total_rev". It should contain the sum of the variable "rev" up to the specific line, thus someting like the following calculation has to be conducted for each line:

> total_rev[i] = total_rev[i-1] + rev[i]

where i is the actual line

Note that the calculation has to start from zero for each user. I've already tried to solve this with a loop, which worked for a small testcase, but the dateframe is quite huge and the calculation on the complete data set just didn't want to end.

Upvotes: 3

Views: 82

Answers (3)

Jilber Urbina
Jilber Urbina

Reputation: 61164

When dealing with huge database, data.table is a good option

> library(data.table)
> DT <- data.table(df)
> DT[, total:= cumsum(rev), by=list(user) ]
> DT
   user rev total_rev total
1:    A  10        10    10
2:    A  10        20    20
3:    A  20        40    40
4:    A  10        50    50
5:    B  50        50    50
6:    C  50        50    50
7:    C  10        60    60
8:    C  20        80    80

Upvotes: 3

sgibb
sgibb

Reputation: 25736

You could use ?ave and ?cumsum:

ave(df$rev, df$user, cumsum)

E.g.:

df <- read.table(textConnection("user   rev   total_rev
A      10    10   
A      10    20 
A      20    40
A      10    50
B      50    50  
C      50    50   
C      10    60
C      20    80"), header=TRUE)

df$total <- ave(df$rev, df$user, cumsum)
#  user rev total_rev total
#1    A  10        10    10
#2    A  10        20    20
#3    A  20        40    40
#4    A  10        50    50
#5    B  50        50    50
#6    C  50        50    50
#7    C  10        60    60
#8    C  20        80    80

Upvotes: 2

Metrics
Metrics

Reputation: 15458

library(plyr)
mydata<-mtcars
ddply(mydata,.(cyl),transform,mpg=cumsum(mpg))

For your data:

library(plyr)
ddply(yourdata,.(user),transform,total_rev=cumsum(rev))

 user rev total_rev
1    A  10        10
2    A  10        20
3    A  20        40
4    A  10        50
5    B  50        50
6    C  50        50
7    C  10        60
8    C  20        80

Upvotes: 0

Related Questions