TobKel
TobKel

Reputation: 1453

R Fill new columns with NA if lag condition is matched

I want to fill the all the lag rows with NA if the condition is matched. I'm explaining it on my example dataframe:

> df
# A tibble: 10 x 3
   date       condition return
   <date>         <dbl>  <int>
 1 2020-05-28         0      1
 2 2020-05-29         0      2
 3 2020-05-30         1      3
 4 2020-05-31         0      4
 5 2020-06-01         0      5
 6 2020-06-02         0      6
 7 2020-06-03         0      7
 8 2020-06-04         0      8
 9 2020-06-05         0      9
10 2020-06-06         0     10

Now I'm trying to mutate multiple (in this example 3) new columns baesed on the "return"-column as follows:

If the lag "condition"-value == 1, then replace the "return"-value by NA.

The same applies to other lags (1,2,3). But in this case NA must be filled for all lags:

   date       condition return  lag1  lag2  lag3
   <date>         <dbl>  <int> <int> <int> <int>
 1 2020-05-28         0      1     1     1     1
 2 2020-05-29         0      2     2     2     2
 3 2020-05-30         1      3     3     3     3
 4 2020-05-31         0      4    NA    NA    NA
 5 2020-06-01         0      5     5    NA    NA
 6 2020-06-02         0      6     6     6    NA
 7 2020-06-03         0      7     7     7     7
 8 2020-06-04         0      8     8     8     8
 9 2020-06-05         0      9     9     9     9
10 2020-06-06         0     10    10    10    10

Can someone help me?

Here is my dataframe:

df <- tibble(date = lubridate::today() + lubridate::days(1:10),
             condition = c(0,0,1,0,0,0,0,0,0,0),
             return = 1:10)

Upvotes: 2

Views: 351

Answers (2)

Darren Tsai
Darren Tsai

Reputation: 35554

You can use "[<-"() to assign NA into the position where the condition matches.

library(dplyr)

df %>%
  mutate(lag1 = `[<-`(return, which(condition == 1) + 1, NA),
         lag2 = `[<-`(return, which(condition == 1) + 1:2, NA),
         lag3 = `[<-`(return, which(condition == 1) + 1:3, NA))

If you do not want to write a single line for each lag, then you can set any lags you want to a vector object and apply mutate() iteratively by reduce() in purrr.

library(purrr)

lag_num <- 1:3
reduce(lag_num,
       ~ mutate(.x, !!paste0("lag", .y) := `[<-`(return, which(condition == 1) + 1:.y, NA)),
       .init = df)

The corresponding base R version:

Reduce(function(x, y){
  x[[paste0("lag", y)]] <- `[<-`(x$return, which(x$condition == 1) + 1:y, NA)
  return(x)
}, lag_num, init = df)

Output

# # A tibble: 10 x 6
#    date       condition return  lag1  lag2  lag3
#    <date>         <dbl>  <int> <int> <int> <int>
#  1 2020-05-28         0      1     1     1     1
#  2 2020-05-29         0      2     2     2     2
#  3 2020-05-30         1      3     3     3     3
#  4 2020-05-31         0      4    NA    NA    NA
#  5 2020-06-01         0      5     5    NA    NA
#  6 2020-06-02         0      6     6     6    NA
#  7 2020-06-03         0      7     7     7     7
#  8 2020-06-04         0      8     8     8     8
#  9 2020-06-05         0      9     9     9     9
# 10 2020-06-06         0     10    10    10    10

Upvotes: 1

chinsoon12
chinsoon12

Reputation: 25225

An option using data.table:

nlags <- 3L
locs <- DT[condition==1L, which=TRUE]
ix <- matrix(NA_integer_, nrow=length(locs), ncol=nlags)
for (x in 1L:nlags) {
    ix[, x] <- pmin(locs + x, nrow(DT))
    set(DT, j=paste0("lag", x), value=replace(DT$return, c(ix), NA_integer_))
}    

And equivalently in base R:

nlags <- 3L
locs <- which(DT$condition==1L)
ix <- matrix(NA_integer_, nrow=length(locs), ncol=nlags)
for (x in 1L:nlags) {
    ix[, x] <- pmin(locs + x, nrow(DT))
    DT[, paste0("lag", x)] <- replace(DT$return, ix, NA_integer_)
}

data:

library(data.table)
DT <- fread("date       condition return
2020-05-28         0      1
2020-05-29         0      2
2020-05-30         1      3
2020-05-31         0      4
2020-06-01         0      5
2020-06-02         0      6
2020-06-03         0      7
2020-06-04         0      8
2020-06-05         0      9
2020-06-06         0     10")

Upvotes: 1

Related Questions