Reputation: 634
How do you lag/lead over dataframe depending on the value of the row - or the row index - to achieve the following output.
Here is a sample dataset:
id <- c(1,1,1,1,1,1)
a <- c("X1","Mar-20","X2","X3","Apr-20", "X4")
test <- data.frame(id,a)
Desired output:
id|a
1 |Mar-20
1 |Mar-20
1 |Mar-20
1 |Apr-20
1 |Apr-20
1 |Apr-20
Upvotes: 0
Views: 307
Reputation: 174536
I think this is a more general solution to this problem.
You can use the following function to specify a number of leads (or lags if negative) for each value in the column to be modified. In a sense this vectorizes lead / lag.
lead_lag <- function(lead_lag_this, by_this, default = NA)
{
diag(sapply(by_this, function(x) if(x == 0) lead_lag_this
else if(x < 0) lead(lead_lag_this, -x, default)
else if(x > 0) lag(lead_lag_this, x, default)))
}
So for example, suppose we add a column to your data frame that specifies the lead or lag:
test$leadlag <- c(-1, 0, 1, -1, 0, 1)
test
#> id a leadlag
#> 1 1 X1 -1
#> 2 1 Mar-20 0
#> 3 1 X2 1
#> 4 1 X3 -1
#> 5 1 Apr-20 0
#> 6 1 X4 1
Then we can use the leadlag
column to lead or lag column a
like this:
test %>% mutate(new_a = lead_lag(a, leadlag))
#> id a leadlag new_a
#> 1 1 X1 -1 Mar-20
#> 2 1 Mar-20 0 Mar-20
#> 3 1 X2 1 Mar-20
#> 4 1 X3 -1 Apr-20
#> 5 1 Apr-20 0 Apr-20
#> 6 1 X4 1 Apr-20
Of course if you just want to create a repeating pattern like c(-1, 0, 1)
based on the row number (without specifying the column leadlag
) you could do
test %>% mutate(new_a = lead_lag(a, (row_number() - 1) %% 3 - 1))
Upvotes: 1
Reputation: 12739
Using dplyr
and stringr
assuming that the pattern shown in the question is consistent for all your data.
library(dplyr)
library(stringr)
test %>%
mutate(a = case_when(lead(str_detect(a, "-")) ~ lead(a),
lag(str_detect(a, "-")) ~ lag(a),
TRUE ~ a))
#> id a
#> 1 1 Mar-20
#> 2 1 Mar-20
#> 3 1 Mar-20
#> 4 1 Apr-20
#> 5 1 Apr-20
#> 6 1 Apr-20
Created on 2020-07-09 by the reprex package (v0.3.0)
Upvotes: 3
Reputation: 51592
You can create a grouping variable based on the number of rows of your df and replace first and third instance by 2nd one, i.e.
with(test, ave(a, rep(seq(nrow(test)/3), each = 3), FUN = function(i){i[c(1, 3)] <- i[2]; i}))
#[1] Mar-20 Mar-20 Mar-20 Apr-20 Apr-20 Apr-20
Upvotes: 1