Nigel Stackhouse
Nigel Stackhouse

Reputation: 364

R apply function to data based on index column value

Example:

require(data.table)
example = matrix(c(rnorm(15, 5, 1), rep(1:3, each=5)), ncol = 2, nrow = 15)
example = data.table(example)
setnames(example, old=c("V1","V2"), new=c("target", "index"))
example


threshold = 100

accumulating_cost = function(x,y) { x-cumsum(y) }
whats_left = accumulating_cost(threshold, example$target)
whats_left

I want whats_left to consist of the difference between threshold and the cumulative sum of values in example$target for which example$index = 1, and 2, and 3. So I used the following for loop:

rm(whats_left)

whats_left = vector("list")
for(i in 1:max(example$index)) {
  whats_left[[i]] = accumulating_cost(threshold, example$target[example$index==i])
}

whats_left = unlist(whats_left)
whats_left

plot(whats_left~c(1:15))

I know for loops aren't the devil in R, but I'm habituating myself to use vectorization when possible (including getting away from apply, being a for loop wrapper). I'm pretty sure it's possible here, but I can't figure out how to do it. Any help would be much appreciated.

Upvotes: 1

Views: 264

Answers (1)

David Arenburg
David Arenburg

Reputation: 92282

All you trying to do is accumulate the cost by index. Thus, you might want to use the by argument as in

example[, accumulating_cost(threshold, target), by = index]

Upvotes: 3

Related Questions