cumulative sum in r based on two columns

Question

R newb; have tried to figure this out on the basis of earlier questions, but didn't really have any success. I have data that looks roughly like the following:

Name     Date        Value
A        2014-09-11  1.23
A        2014-12-11  4.56
A        2014-03-01  7.89
A        2014-06-05  0.12
B        2014-09-25  9.87
B        2014-12-21  6.54
B        2014-11-12  3.21

I'm looking to perform the following task on a data-frame: Add an index column that counts the cumulative occurrences of the column Name (which contains strings, not factors). For each "Name" replace all elements at cumulative index k or larger with the element at index k-1 for the given Name.

So for k=4, the result would be:

Name     Date        Value
A        2014-09-11  1.23
A        2014-12-11  4.56
A        2014-03-01  7.89
A        2014-06-05  7.89
B        2014-09-25  9.87
B        2014-12-21  6.54
B        2014-11-12  3.21

Any hints at how to do this in idiomatic R; looping over the frame will probably work, but I'm trying to learn to do this the way it was intended, to pick up some R skills on the go as well.

Mahdi Jadaliha · Accepted Answer

I think that you are looking for this:

require("data.table")

A = data.table(
Name = c("A","A","A","A","B","B","B"), 
Date = c("2014-09-11", "2014-12-11", "2014-03-01", "2014-06-05", "2014-09-25", "2014-12-21", "2014-11-12"), 
Value = c(1.23, 4.56, 7.89, 0.12, 9.87, 6.54,3.21))


A[,IX:=seq(1,.N),by="Name"]

enter image description here

Edit: (Since you corrected the question, I update my answer.)

func = function(x,b){return(c(x[seq(1,b)],rep(x[b],length(x)-b)))}
k = 4
A[,Value:=func(Value,k-1),by="Name"]

enter image description here

cumulative sum in r based on two columns

Answers (1)

Related Questions