Reputation: 1089
I'm using Base R 4.2.1 and have a strange problem with lagging a column.
My dataframe has two columns, one for time and the other for the yield of a bond
df_bond <- data. Frame(t = seq(0, 10, 1),
y_t = seq(.1, 0, -.01)
)
> df_bond
t y_t
1 0 0.10
2 1 0.09
3 2 0.08
4 3 0.07
5 4 0.06
6 5 0.05
7 6 0.04
8 7 0.03
9 8 0.02
10 9 0.01
11 10 0.00
I then add a column for price, and it works like a charm:
> df_bond$p_t <- 100 / (1 + df_bond$y_t)^(10 - df_bond$t)
> df_bond
t y_t p_t
1 0 0.10 38.55433
2 1 0.09 46.04278
3 2 0.08 54.02689
4 3 0.07 62.27497
5 4 0.06 70.49605
6 5 0.05 78.35262
7 6 0.04 85.48042
8 7 0.03 91.51417
9 8 0.02 96.11688
10 9 0.01 99.00990
11 10 0.00 100.00000
But if I now add a column for return, I get something weird:
> df_bond$r_prior_yr <- df_bond$p_t / lag(df_bond$p_t, 1)
> df_bond
t y_t p_t r_prior_yr
1 0 0.10 38.55433 1
2 1 0.09 46.04278 1
3 2 0.08 54.02689 1
4 3 0.07 62.27497 1
5 4 0.06 70.49605 1
6 5 0.05 78.35262 1
7 6 0.04 85.48042 1
8 7 0.03 91.51417 1
9 8 0.02 96.11688 1
10 9 0.01 99.00990 1
11 10 0.00 100.00000 1
Doing the division without the assignment shows that it is picking up a time series attribute, but I can't for the life of me explain why the ratio is always 1.
> df_bond$p_t / lag(df_bond$p_t, 1)
[1] 1 1 1 1 1 1 1 1 1 1 1
attr(,"tsp")
[1] 0 10 1
Upvotes: 1
Views: 1809
Reputation: 269481
Calculate p_t, convert to zoo, use zoo's diff with arith = FALSE to get the ratio and convert back. If a zoo object is ok then the last line can be omitted.
library(zoo)
df_bond |>
transform(p_t = 100 / (1 + y_t)^(10 - t)) |>
read.zoo() |>
transform(r_prior_year = diff(p_t, arith = FALSE, na.pad = TRUE)) |>
fortify.zoo(name = "t")
giving:
t y_t p_t r_prior_year
1 0 0.10 38.55433 NA
2 1 0.09 46.04278 1.194231
3 2 0.08 54.02689 1.173406
4 3 0.07 62.27497 1.152666
5 4 0.06 70.49605 1.132013
6 5 0.05 78.35262 1.111447
7 6 0.04 85.48042 1.090971
8 7 0.03 91.51417 1.070586
9 8 0.02 96.11688 1.050295
10 9 0.01 99.00990 1.030099
11 10 0.00 100.00000 1.010000
Upvotes: 1
Reputation: 5722
stats::lag
mus be used over time series to give your expected result. Thus, modify your code as follows:
This do not need dplyr
.
NOTE: dplyr::lag
masks the base stats::lag
function, thus if you will to use this solution, with dplyr loaded, you must use the fully qualified stats::lag
instead of lag
df_bond <- data.frame(t = ts(seq(0, 10, 1)),
y_t = ts(seq(.1, 0, -.01)))
df_bond$p_t <- 100 / (1 + df_bond$y_t)^(10 - df_bond$t)
df_bond[2:nrow(df_bond),"r_prior_yr"] <- df_bond$p_t / lag(df_bond$p_t, 1)
df_bond
#> t y_t p_t r_prior_yr
#> 1 0 0.10 38.55433 NA
#> 2 1 0.09 46.04278 0.8373589
#> 3 2 0.08 54.02689 0.8522197
#> 4 3 0.07 62.27497 0.8675538
#> 5 4 0.06 70.49605 0.8833824
#> 6 5 0.05 78.35262 0.8997281
#> 7 6 0.04 85.48042 0.9166148
#> 8 7 0.03 91.51417 0.9340676
#> 9 8 0.02 96.11688 0.9521134
#> 10 9 0.01 99.00990 0.9707805
#> 11 10 0.00 100.00000 0.9900990
Upvotes: 1
Reputation: 3854
Using the lag function from dplyr should get the result you're looking for. By default I suspect this is stats::lag causing your issue.
df_bond <- data.frame(t = seq(0, 10, 1),
y_t = seq(.1, 0, -.01))
df_bond$p_t <- 100 / (1 + df_bond$y_t)^(10 - df_bond$t)
df_bond$r_prior_yr <- df_bond$p_t / dplyr::lag(df_bond$p_t, 1)
Upvotes: 1
Reputation: 173793
Try using dplyr::lag
df_bond <- data.frame(t = seq(0, 10, 1),
y_t = seq(.1, 0, -.01)
)
df_bond$p_t <- 100 / (1 + df_bond$y_t)^(10 - df_bond$t)
df_bond$r_prior_yr <- df_bond$p_t / dplyr::lag(df_bond$p_t, 1)
df_bond
#> t y_t p_t r_prior_yr
#> 1 0 0.10 38.55433 NA
#> 2 1 0.09 46.04278 1.194231
#> 3 2 0.08 54.02689 1.173406
#> 4 3 0.07 62.27497 1.152666
#> 5 4 0.06 70.49605 1.132013
#> 6 5 0.05 78.35262 1.111447
#> 7 6 0.04 85.48042 1.090971
#> 8 7 0.03 91.51417 1.070586
#> 9 8 0.02 96.11688 1.050295
#> 10 9 0.01 99.00990 1.030099
#> 11 10 0.00 100.00000 1.010000
Upvotes: 4