user9292
user9292

Reputation: 1145

Cumulative product of (1-previous_record)*current_record

The data frame contains two variables (time and rate) and 10 observations

time <- seq(1:10) 
rate <- 1-(0.99^time)
dat <- data.frame(time, rate)

I need to add a new column (called new_rate).

new_rate is defined as follows

Note: new_rate_1 is the first observation of new the column new_rate, etc.

new_rate_1 = rate_1
new_rate_2 = (1-rate_1)*rate_2
new_rate_3 = (1-rate_1)*(1-rate_2)*rate_3
new_rate_4 = (1-rate_1)*(1-rate_2)*(1-rate_3)*rate_4
...
new_rate_10 = (1-rate_1)*(1-rate_2)*(1-rate_3)*(1-rate_4)*(1-rate_5)*(1-rate_6)*(1-rate_7)*(1-rate_8)*(1-rate_9)*rate_10

How this can be done in base R or dplyr?

Upvotes: 5

Views: 677

Answers (3)

Anoushiravan R
Anoushiravan R

Reputation: 21918

In case you are still interested in how to do it with purrr::reduce family of functions. Here are two solutions:

  • In every iteration if you multiply the accumulated/previous value by (1/previous value of rate - 1) * (current value of rate) you will get the desired output in every row
library(purrr)

accumulate2(dat$rate[-nrow(dat)], dat$rate[-1], .init = dat$rate[1], 
           ~ ..1 * (1/..2 - 1) * ..3) %>%
  simplify()

 [1] 0.01000000 0.01970100 0.02881885 0.03709807 0.04432372 0.05033049 0.05500858 0.05830607
 [9] 0.06022773 0.06083074

And also in base R we could do the following:

Reduce(function(x, y) {
  x * (1/dat$rate[y - 1] - 1) * dat$rate[y]
}, init = dat$rate[1], 
seq_len(nrow(dat))[-1], accumulate = TRUE)

 [1] 0.01000000 0.01970100 0.02881885 0.03709807 0.04432372 0.05033049 0.05500858 0.05830607
 [9] 0.06022773 0.06083074

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 101628

A straightforward math approach using cumprod should work

> c(1, head(cumprod(1 - rate), -1)) * rate
 [1] 0.01000000 0.01970100 0.02881885 0.03709807 0.04432372 0.05033049
 [7] 0.05500858 0.05830607 0.06022773 0.06083074

If you want to practice with recursions, you can try the method below

f <- function(v, k = length(v)) {
    if (k == 1) {
        return(v[k])
    }
    u <- f(v, k - 1)
    c(u, tail(u, 1) * (1 / v[k - 1] - 1) * v[k])
}

such that

> f(rate)
 [1] 0.01000000 0.01970100 0.02881885 0.03709807 0.04432372 0.05033049
 [7] 0.05500858 0.05830607 0.06022773 0.06083074

Upvotes: 3

thelatemail
thelatemail

Reputation: 93833

cumprod to the rescue (hat-tip to @Cole for simplifying the code):

dat$rate * c(1, cumprod(1 - head(dat$rate, -1)))

The logic is that you are essentially doing a cumulative product of 1 - dat$rate, multiplied by the current step.
At the first step, you can just keep the existing value, but then you need to offset the two vectors so that the multiplication gives the desired result.

Proof:

out <- c(
dat$rate[1],
(1-dat$rate[1])*dat$rate[2],
(1-dat$rate[1])*(1-dat$rate[2])*dat$rate[3],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*dat$rate[4],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*dat$rate[5],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*dat$rate[6],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*(1-dat$rate[6])*dat$rate[7],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*(1-dat$rate[6])*(1-dat$rate[7])*dat$rate[8],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*(1-dat$rate[6])*(1-dat$rate[7])*(1-dat$rate[8])*dat$rate[9],
(1-dat$rate[1])*(1-dat$rate[2])*(1-dat$rate[3])*(1-dat$rate[4])*(1-dat$rate[5])*(1-dat$rate[6])*(1-dat$rate[7])*(1-dat$rate[8])*(1-dat$rate[9])*dat$rate[10]
)

all.equal(
  dat$rate * c(1, cumprod(1 - head(dat$rate, -1))),
  out
)
#[1] TRUE

Upvotes: 8

Related Questions