Reputation: 878
Suppose I have a data frame like the one below:
library(dplyr)
library(tibble)
df <- tibble(t = 1:3, a = c(4, 6, 11), b = 1:3, c = c(1, 3, 2))
df
# A tibble: 3 x 4
# t a b c
# <int> <dbl> <int> <dbl>
# 1 1 4 1 1
# 2 2 6 2 3
# 3 3 11 3 2
I want to calculate a new column d
such that
d(0) = a
d(t) = d(t-1) + b(t-1) + c(t-1)
I can do this (painfully) with a for loop:
for_loop <- add_column(df, d = 0)
for (i in 1:nrow(df)) {
if(i == 1) {
for_loop[i,]$d <- for_loop[i,]$a
}
else {
for_loop[i,]$d <- for_loop[i-1,]$d + for_loop[i-1,]$b + for_loop[i-1,]$c
}
}
for_loop
# A tibble: 3 x 5
# t a b c d
# <int> <dbl> <int> <dbl> <dbl>
# 1 1 4 1 1 4
# 2 2 6 2 3 6
# 3 3 11 3 2 11
But I want to do it using dplyr. Trying a simple ifelse doesn't seem to work, because the calculation is vectorized, so the old values of d
are used:
tidy <- add_column(df, d = 0) %>%
mutate(tidy, d = ifelse(row_number(d)==1, a, lag(d) + lag(b) + lag(c)))
tidy
# A tibble: 3 x 5
# t a b c d
# <int> <dbl> <int> <dbl> <dbl>
# 1 1 4 1 1 4
# 2 2 6 2 3 2
# 3 3 11 3 2 5
How can I use a previous value of d
to calculate a new value of d
using dplyr?
Upvotes: 2
Views: 615
Reputation: 214927
You can do cumsum
on lagged b+c
and then add the initial value a[1]
to the result:
df %>% mutate(d = a[1] + cumsum(lag(b + c, default = 0)))
# A tibble: 3 x 5
# t a b c d
# <int> <dbl> <int> <dbl> <dbl>
#1 1 4 1 1 4
#2 2 6 2 3 6
#3 3 11 3 2 11
In order to vectorize it, you need to eliminate the previous d
(d[t-1]) from your formula like follows:
d[t] = d[t-1] + b[t-1] + c[t-1]
= d[t-2] + b[t-2] + c[t-2] + b[t-1] + c[t-1]
= d[1] + 0 + b[1] + b[2] + ... + b[t-1] + 0 + c[1] + c[2] + ... + c[t-1]
= a[1] + sum(lag(b + c, default=0))
d = a[1] + cumsum(lag(b + c, default = 0))
Upvotes: 2