Reputation: 418
I would like to know if the following calculation is possible using dplyr
.
x <- data.frame(
yr = c(2012, 2013, 2014, 2015, 2016),
rate = c(1.1, 1.2, 0.8, -0.4, 0.5)
) %>% arrange(desc(yr))
This is how I want to calculate y:
y[i] = ifelse(yr == max(yr), 100,
100 * y[i-1]/(100 + rate[i-1]))
If I try something like this:
x %>%
mutate(
y = ifelse(
yr == max(yr), 100,
100 * lag(y) / (100 + lag(rate))
)
)
it returns the following error: Evaluation error: object 'y' not found.
As reflected in the title, I would like a dplyr solution inside a pipe without using packages like zoo
or data.table
mainly for its SQL translatability with different databases.
Would this be possible?
Upvotes: 0
Views: 405
Reputation: 269491
Try cumprod
like this:
x %>% mutate(y = 100 * cumprod(100 / (100 + lag(rate, default = 0))))
giving:
yr rate y
1 2016 0.5 100.00000
2 2015 -0.4 99.50249
3 2014 0.8 99.90210
4 2013 1.2 99.10922
5 2012 1.1 97.93401
Regarding databases, I doubt dplyr can do that but you could use sql directly with the database. Here is an example using sqldf with the sqlite back end. The same code also works with the H2 database back end.
library(sqldf)
sqldf("select a.yr, a.rate, 100 * coalesce(exp(sum(log(100/(100 + b.rate)))), 1) y
from x a left join x b on a.yr < b.yr group by a.yr
order by a.yr desc")
giving:
yr rate y
1 2016 0.5 100.00000
2 2015 -0.4 99.50249
3 2014 0.8 99.90210
4 2013 1.2 99.10922
5 2012 1.1 97.93401
Upvotes: 3
Reputation: 11955
Another option could be to use for
loop
library(dplyr)
#initialize column "y"
x$y <- NA
#process one row at a time
for (i in seq(nrow(x))) {
x[i,] <- (x[seq(i),] %>%
mutate(y = ifelse(yr==max(yr), 100, 100 * lag(y) / (100 + lag(rate)))))[i,]
}
x
Output is:
yr rate y
1 2016 0.5 100.00000
2 2015 -0.4 99.50249
3 2014 0.8 99.90210
4 2013 1.2 99.10922
5 2012 1.1 97.93401
Sample data:
x <- structure(list(yr = c(2016, 2015, 2014, 2013, 2012), rate = c(0.5,
-0.4, 0.8, 1.2, 1.1)), class = "data.frame", row.names = c(NA,
-5L), .Names = c("yr", "rate"))
Upvotes: 0
Reputation: 887048
An option would be to use accumulate
from purrr
library(tidyverse)
x %>%
mutate(y = accumulate(rate[-n()],
~ 100 * .x/(100 + .y),
.init = 100))
# yr rate y
#1 2016 0.5 100.00000
#2 2015 -0.4 99.50249
#3 2014 0.8 99.90210
#4 2013 1.2 99.10922
#5 2012 1.1 97.93401
It can also be done in base R
with Reduce
Reduce(function(u, v) 100 * u/(100 + v) , x$rate[-nrow(x)],init = 100, accumulate = TRUE)
#[1] 100.00000 99.50249 99.90210 99.10922 97.93401
Based on the OP's logic, 1st element is initialized as 100
> 100 * (100)/(100 + 0.5) # 2nd element
[1] 99.50249
> 100 * 99.50249/(100 - 0.4) # 3rd element
[1] 99.9021
> 100 * 99.9021/(100 + 0.8) # 4th element
[1] 99.10923
> 100 * 99.10923/(100 + 1.2) # 5th element
[1] 97.93402
Upvotes: 3