Reputation: 634

How to perform lag/lead over dataframe depending on the row index

How do you lag/lead over dataframe depending on the value of the row - or the row index - to achieve the following output.

Here is a sample dataset:

id <- c(1,1,1,1,1,1)
a <- c("X1","Mar-20","X2","X3","Apr-20", "X4")

test <- data.frame(id,a)

Desired output:

id|a
1 |Mar-20
1 |Mar-20
1 |Mar-20
1 |Apr-20
1 |Apr-20
1 |Apr-20

Upvotes: 0

Answers (3)

Allan Cameron

Reputation: 174536

I think this is a more general solution to this problem.

You can use the following function to specify a number of leads (or lags if negative) for each value in the column to be modified. In a sense this vectorizes lead / lag.

lead_lag <- function(lead_lag_this, by_this, default = NA)
{
  diag(sapply(by_this, function(x) if(x == 0) lead_lag_this 
                              else if(x < 0) lead(lead_lag_this, -x, default)
                              else if(x > 0) lag(lead_lag_this, x, default)))
}

So for example, suppose we add a column to your data frame that specifies the lead or lag:

test$leadlag <- c(-1, 0, 1, -1, 0, 1)
test
#>   id      a leadlag
#> 1  1     X1      -1
#> 2  1 Mar-20       0
#> 3  1     X2       1
#> 4  1     X3      -1
#> 5  1 Apr-20       0
#> 6  1     X4       1

Then we can use the leadlag column to lead or lag column a like this:

test %>% mutate(new_a = lead_lag(a, leadlag))
#>   id      a leadlag  new_a
#> 1  1     X1      -1 Mar-20
#> 2  1 Mar-20       0 Mar-20
#> 3  1     X2       1 Mar-20
#> 4  1     X3      -1 Apr-20
#> 5  1 Apr-20       0 Apr-20
#> 6  1     X4       1 Apr-20

Of course if you just want to create a repeating pattern like c(-1, 0, 1) based on the row number (without specifying the column leadlag) you could do

test %>% mutate(new_a = lead_lag(a, (row_number() - 1) %% 3 - 1))

Upvotes: 1

Peter

Reputation: 12739

Using dplyr and stringr assuming that the pattern shown in the question is consistent for all your data.

library(dplyr)
library(stringr)

test %>% 
  mutate(a = case_when(lead(str_detect(a, "-")) ~ lead(a),
                       lag(str_detect(a, "-")) ~ lag(a),
                       TRUE ~ a))
#>   id      a
#> 1  1 Mar-20
#> 2  1 Mar-20
#> 3  1 Mar-20
#> 4  1 Apr-20
#> 5  1 Apr-20
#> 6  1 Apr-20

^{Created on 2020-07-09 by the reprex package (v0.3.0)}

Upvotes: 3

Sotos

Reputation: 51592

You can create a grouping variable based on the number of rows of your df and replace first and third instance by 2nd one, i.e.

with(test, ave(a, rep(seq(nrow(test)/3), each = 3), FUN = function(i){i[c(1, 3)] <- i[2]; i}))
#[1] Mar-20 Mar-20 Mar-20 Apr-20 Apr-20 Apr-20

Upvotes: 1

How to perform lag/lead over dataframe depending on the row index

Answers (3)

Related Questions