creativename
creativename

Reputation: 418

Iterative Calculations using dplyr

I would like to know if the following calculation is possible using dplyr.

x <- data.frame(
  yr = c(2012, 2013, 2014, 2015, 2016),
  rate = c(1.1, 1.2, 0.8, -0.4, 0.5)
) %>% arrange(desc(yr))

This is how I want to calculate y:

y[i] = ifelse(yr == max(yr), 100,
    100 * y[i-1]/(100 + rate[i-1]))

If I try something like this:

x %>%
  mutate(
    y = ifelse(
      yr == max(yr), 100,
      100 * lag(y) / (100 + lag(rate)) 
    )
  )

it returns the following error: Evaluation error: object 'y' not found.

As reflected in the title, I would like a dplyr solution inside a pipe without using packages like zoo or data.table mainly for its SQL translatability with different databases.
Would this be possible?

Upvotes: 0

Views: 405

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269491

Try cumprod like this:

x %>% mutate(y = 100 * cumprod(100 / (100 + lag(rate, default = 0))))

giving:

    yr rate         y
1 2016  0.5 100.00000
2 2015 -0.4  99.50249
3 2014  0.8  99.90210
4 2013  1.2  99.10922
5 2012  1.1  97.93401

Regarding databases, I doubt dplyr can do that but you could use sql directly with the database. Here is an example using sqldf with the sqlite back end. The same code also works with the H2 database back end.

library(sqldf)

sqldf("select a.yr, a.rate, 100 * coalesce(exp(sum(log(100/(100 + b.rate)))), 1) y 
      from x a left join x b on a.yr < b.yr group by a.yr 
      order by a.yr desc")

giving:

    yr rate         y
1 2016  0.5 100.00000
2 2015 -0.4  99.50249
3 2014  0.8  99.90210
4 2013  1.2  99.10922
5 2012  1.1  97.93401

Upvotes: 3

Prem
Prem

Reputation: 11955

Another option could be to use for loop

library(dplyr)

#initialize column "y"
x$y <- NA

#process one row at a time
for (i in seq(nrow(x))) {
  x[i,] <- (x[seq(i),] %>%
              mutate(y = ifelse(yr==max(yr), 100, 100 * lag(y) / (100 + lag(rate)))))[i,]
}
x

Output is:

    yr rate         y
1 2016  0.5 100.00000
2 2015 -0.4  99.50249
3 2014  0.8  99.90210
4 2013  1.2  99.10922
5 2012  1.1  97.93401

Sample data:

x <- structure(list(yr = c(2016, 2015, 2014, 2013, 2012), rate = c(0.5, 
-0.4, 0.8, 1.2, 1.1)), class = "data.frame", row.names = c(NA, 
-5L), .Names = c("yr", "rate"))

Upvotes: 0

akrun
akrun

Reputation: 887048

An option would be to use accumulate from purrr

library(tidyverse)
x %>%
   mutate(y = accumulate(rate[-n()], 
              ~  100 * .x/(100 + .y), 
                 .init = 100))
#   yr rate         y
#1 2016  0.5 100.00000 
#2 2015 -0.4  99.50249
#3 2014  0.8  99.90210
#4 2013  1.2  99.10922
#5 2012  1.1  97.93401

It can also be done in base R with Reduce

Reduce(function(u, v) 100 * u/(100 + v) , x$rate[-nrow(x)],init = 100, accumulate = TRUE)
#[1] 100.00000  99.50249  99.90210  99.10922  97.93401

Based on the OP's logic, 1st element is initialized as 100

>  100 * (100)/(100 + 0.5)  # 2nd element
[1] 99.50249
>  100 * 99.50249/(100 - 0.4) # 3rd element
[1] 99.9021
>  100 * 99.9021/(100 + 0.8) # 4th element
[1] 99.10923
>  100 * 99.10923/(100 + 1.2) # 5th element
[1] 97.93402

Upvotes: 3

Related Questions