Reputation: 1089

Lag not working correctly on an R dataframe

I'm using Base R 4.2.1 and have a strange problem with lagging a column.

My dataframe has two columns, one for time and the other for the yield of a bond

df_bond <- data. Frame(t  = seq(0, 10, 1),
                      y_t = seq(.1, 0, -.01)
                     )
> df_bond
    t  y_t
1   0 0.10
2   1 0.09
3   2 0.08
4   3 0.07
5   4 0.06
6   5 0.05
7   6 0.04
8   7 0.03
9   8 0.02
10  9 0.01
11 10 0.00

I then add a column for price, and it works like a charm:

> df_bond$p_t <- 100 / (1 + df_bond$y_t)^(10 - df_bond$t)
> df_bond
    t  y_t       p_t
1   0 0.10  38.55433
2   1 0.09  46.04278
3   2 0.08  54.02689
4   3 0.07  62.27497
5   4 0.06  70.49605
6   5 0.05  78.35262
7   6 0.04  85.48042
8   7 0.03  91.51417
9   8 0.02  96.11688
10  9 0.01  99.00990
11 10 0.00 100.00000

But if I now add a column for return, I get something weird:

> df_bond$r_prior_yr <- df_bond$p_t / lag(df_bond$p_t, 1)
> df_bond
    t  y_t       p_t r_prior_yr
1   0 0.10  38.55433          1
2   1 0.09  46.04278          1
3   2 0.08  54.02689          1
4   3 0.07  62.27497          1
5   4 0.06  70.49605          1
6   5 0.05  78.35262          1
7   6 0.04  85.48042          1
8   7 0.03  91.51417          1
9   8 0.02  96.11688          1
10  9 0.01  99.00990          1
11 10 0.00 100.00000          1

Doing the division without the assignment shows that it is picking up a time series attribute, but I can't for the life of me explain why the ratio is always 1.

> df_bond$p_t / lag(df_bond$p_t, 1)
 [1] 1 1 1 1 1 1 1 1 1 1 1
attr(,"tsp")
[1]  0 10  1

Upvotes: 1

Answers (4)

G. Grothendieck

Reputation: 270248

Calculate p_t, convert to zoo, use zoo's diff with arith = FALSE to get the ratio and convert back. If a zoo object is ok then the last line can be omitted.

library(zoo)

df_bond |>
  transform(p_t = 100 / (1 + y_t)^(10 - t)) |>
  read.zoo() |>
  transform(r_prior_year = diff(p_t, arith = FALSE, na.pad = TRUE)) |>
  fortify.zoo(name = "t")

giving:

    t  y_t       p_t r_prior_year
1   0 0.10  38.55433           NA
2   1 0.09  46.04278     1.194231
3   2 0.08  54.02689     1.173406
4   3 0.07  62.27497     1.152666
5   4 0.06  70.49605     1.132013
6   5 0.05  78.35262     1.111447
7   6 0.04  85.48042     1.090971
8   7 0.03  91.51417     1.070586
9   8 0.02  96.11688     1.050295
10  9 0.01  99.00990     1.030099
11 10 0.00 100.00000     1.010000

Upvotes: 1

Ric

Reputation: 5721

stats::lag mus be used over time series to give your expected result. Thus, modify your code as follows: This do not need dplyr.

NOTE: dplyr::lag masks the base stats::lag function, thus if you will to use this solution, with dplyr loaded, you must use the fully qualified stats::lag instead of lag

df_bond <- data.frame(t = ts(seq(0, 10, 1)),
                      y_t = ts(seq(.1, 0, -.01)))

df_bond$p_t <- 100 / (1 + df_bond$y_t)^(10 - df_bond$t)

df_bond[2:nrow(df_bond),"r_prior_yr"] <- df_bond$p_t / lag(df_bond$p_t, 1)

df_bond
#>     t  y_t       p_t r_prior_yr
#> 1   0 0.10  38.55433         NA
#> 2   1 0.09  46.04278  0.8373589
#> 3   2 0.08  54.02689  0.8522197
#> 4   3 0.07  62.27497  0.8675538
#> 5   4 0.06  70.49605  0.8833824
#> 6   5 0.05  78.35262  0.8997281
#> 7   6 0.04  85.48042  0.9166148
#> 8   7 0.03  91.51417  0.9340676
#> 9   8 0.02  96.11688  0.9521134
#> 10  9 0.01  99.00990  0.9707805
#> 11 10 0.00 100.00000  0.9900990

Upvotes: 1

Seth

Reputation: 3884

Using the lag function from dplyr should get the result you're looking for. By default I suspect this is stats::lag causing your issue.

   df_bond <- data.frame(t  = seq(0, 10, 1),
                           y_t = seq(.1, 0, -.01))
    df_bond$p_t <- 100 / (1 + df_bond$y_t)^(10 - df_bond$t)
    df_bond$r_prior_yr <- df_bond$p_t / dplyr::lag(df_bond$p_t, 1)

Upvotes: 1

Allan Cameron

Reputation: 174516

Try using dplyr::lag

df_bond <- data.frame(t  = seq(0, 10, 1),
                       y_t = seq(.1, 0, -.01)
)
df_bond$p_t <- 100 / (1 + df_bond$y_t)^(10 - df_bond$t)
df_bond$r_prior_yr <- df_bond$p_t / dplyr::lag(df_bond$p_t, 1)
df_bond
#>     t  y_t       p_t r_prior_yr
#> 1   0 0.10  38.55433         NA
#> 2   1 0.09  46.04278   1.194231
#> 3   2 0.08  54.02689   1.173406
#> 4   3 0.07  62.27497   1.152666
#> 5   4 0.06  70.49605   1.132013
#> 6   5 0.05  78.35262   1.111447
#> 7   6 0.04  85.48042   1.090971
#> 8   7 0.03  91.51417   1.070586
#> 9   8 0.02  96.11688   1.050295
#> 10  9 0.01  99.00990   1.030099
#> 11 10 0.00 100.00000   1.010000

Upvotes: 4

Lag not working correctly on an R dataframe

Answers (4)

Related Questions