R for each row calculate a sum taking values of one of the columns from the rows BEFORE that satisfy special condition

Question

I am using R. I have a big dataset of more than 400K lines. Here is the code to reproduce 5 lines of similar data frame:

Date = as.Date(c("2013-01-03", "2013-01-03", "2013-01-04", "2013-01-04", "2013-01-05"))
CustomerID = as.factor(c("A", "B", "A", "C", "A"))
PurchaseNS = c(13, 14, 12, 8, 10)
df = data.frame(Date, CustomerID, PurchaseNS)

> df
        Date CustomerID PurchaseNS
1 2013-01-03          A         13
2 2013-01-03          B         14
3 2013-01-04          A         12
4 2013-01-04          C          8
5 2013-01-05          A         10

What I need is to add an additional column that for an each row equals to the sum of previous purchases of this customer. So in the end I am trying to get next data frame:

> df
        Date CustomerID PurchaseNS previousPurchases
1 2013-01-03          A         13                 0
2 2013-01-03          B         14                 0
3 2013-01-04          A         12                13
4 2013-01-04          C          8                 0
5 2013-01-05          A         10                25

I can achieve that with for loop, but it takes too much time and I know that it's not recommended to use loops in R.

for (i in 1:nrow(df)) {
  df[i, 4] = sum(subset(df, df$CustomerID == df$CustomerID[i] & df$Date < df$Date[i])$PurchaseNS)
}

I also tried using sapply, but the code ends up looking similar to the one above and also takes too much time.

sapply(1:nrow(df), function(i) df[i, 4] = 
     sum(subset(df, df$CustomerID == df$CustomerID[i] & df$Date < df$Date[i])$Purchase))

I guess, functions like by, with, cumsum, apply could be useful, but so far I was not able to apply them.

Thanks in advance for your suggestions!

BrodieG · Accepted Answer

You can try:

df$prevPurch <- ave(
  df$PurchaseNS, df$CustomerID, 
  FUN=function(x) cumsum(c(0, head(x, -1)))
)

which produces:

#         Date CustomerID PurchaseNS prevPurch
# 1 2013-01-03          A         13         0
# 2 2013-01-03          B         14         0
# 3 2013-01-04          A         12        13
# 4 2013-01-04          C          8         0
# 5 2013-01-05          A         10        25

ave breaks up a vector by the groups in another vector, and then applies a function to each group.

R for each row calculate a sum taking values of one of the columns from the rows BEFORE that satisfy special condition

Answers (1)

Related Questions