Reputation: 1453
I want to fill the all the lag rows with NA if the condition is matched. I'm explaining it on my example dataframe:
> df
# A tibble: 10 x 3
date condition return
<date> <dbl> <int>
1 2020-05-28 0 1
2 2020-05-29 0 2
3 2020-05-30 1 3
4 2020-05-31 0 4
5 2020-06-01 0 5
6 2020-06-02 0 6
7 2020-06-03 0 7
8 2020-06-04 0 8
9 2020-06-05 0 9
10 2020-06-06 0 10
Now I'm trying to mutate multiple (in this example 3) new columns baesed on the "return"-column as follows:
If the lag "condition"-value == 1, then replace the "return"-value by NA
.
The same applies to other lags (1,2,3). But in this case NA must be filled for all lags:
date condition return lag1 lag2 lag3
<date> <dbl> <int> <int> <int> <int>
1 2020-05-28 0 1 1 1 1
2 2020-05-29 0 2 2 2 2
3 2020-05-30 1 3 3 3 3
4 2020-05-31 0 4 NA NA NA
5 2020-06-01 0 5 5 NA NA
6 2020-06-02 0 6 6 6 NA
7 2020-06-03 0 7 7 7 7
8 2020-06-04 0 8 8 8 8
9 2020-06-05 0 9 9 9 9
10 2020-06-06 0 10 10 10 10
Can someone help me?
Here is my dataframe:
df <- tibble(date = lubridate::today() + lubridate::days(1:10),
condition = c(0,0,1,0,0,0,0,0,0,0),
return = 1:10)
Upvotes: 2
Views: 351
Reputation: 35554
You can use "[<-"()
to assign NA
into the position where the condition matches.
library(dplyr)
df %>%
mutate(lag1 = `[<-`(return, which(condition == 1) + 1, NA),
lag2 = `[<-`(return, which(condition == 1) + 1:2, NA),
lag3 = `[<-`(return, which(condition == 1) + 1:3, NA))
If you do not want to write a single line for each lag, then you can set any lags you want to a vector object and apply mutate()
iteratively by reduce()
in purrr
.
library(purrr)
lag_num <- 1:3
reduce(lag_num,
~ mutate(.x, !!paste0("lag", .y) := `[<-`(return, which(condition == 1) + 1:.y, NA)),
.init = df)
The corresponding base
R version:
Reduce(function(x, y){
x[[paste0("lag", y)]] <- `[<-`(x$return, which(x$condition == 1) + 1:y, NA)
return(x)
}, lag_num, init = df)
Output
# # A tibble: 10 x 6
# date condition return lag1 lag2 lag3
# <date> <dbl> <int> <int> <int> <int>
# 1 2020-05-28 0 1 1 1 1
# 2 2020-05-29 0 2 2 2 2
# 3 2020-05-30 1 3 3 3 3
# 4 2020-05-31 0 4 NA NA NA
# 5 2020-06-01 0 5 5 NA NA
# 6 2020-06-02 0 6 6 6 NA
# 7 2020-06-03 0 7 7 7 7
# 8 2020-06-04 0 8 8 8 8
# 9 2020-06-05 0 9 9 9 9
# 10 2020-06-06 0 10 10 10 10
Upvotes: 1
Reputation: 25225
An option using data.table
:
nlags <- 3L
locs <- DT[condition==1L, which=TRUE]
ix <- matrix(NA_integer_, nrow=length(locs), ncol=nlags)
for (x in 1L:nlags) {
ix[, x] <- pmin(locs + x, nrow(DT))
set(DT, j=paste0("lag", x), value=replace(DT$return, c(ix), NA_integer_))
}
And equivalently in base R:
nlags <- 3L
locs <- which(DT$condition==1L)
ix <- matrix(NA_integer_, nrow=length(locs), ncol=nlags)
for (x in 1L:nlags) {
ix[, x] <- pmin(locs + x, nrow(DT))
DT[, paste0("lag", x)] <- replace(DT$return, ix, NA_integer_)
}
data:
library(data.table)
DT <- fread("date condition return
2020-05-28 0 1
2020-05-29 0 2
2020-05-30 1 3
2020-05-31 0 4
2020-06-01 0 5
2020-06-02 0 6
2020-06-03 0 7
2020-06-04 0 8
2020-06-05 0 9
2020-06-06 0 10")
Upvotes: 1