How to create a new column in data.table based on values of other columns

Question

I have the following data structure in the data.table format:

ID  Cycle  Cycle_Day Cycle_Date  Positive_Test_Date
1   1      1         3/28/2019   NA
1   1      2         3/29/2019   NA
1   1      3         3/30/2019   NA
1   1      NA        NA          3/29/2019
1   2      1         4/23/2019   NA 
1   2      2         4/24/2019   NA
1   2      3         4/25/2019   NA
1   2      NA        NA          4/25/2019
2   1      1         3/18/2019   NA
2   1      2         3/19/2019   NA
2   1      3         3/20/2019   NA
2   1      NA        NA          3/18/2019
2   2      1         4/23/2019   NA 
2   2      2         4/24/2019   NA
2   2      3         4/25/2019   NA
2   2      NA        NA          4/24/2019

I would like to create a new column "LH_Date" which will, for every ID and every cycle, copy the date in the event Cycle_Date and Positive_Test_Date match. Otherwise the value is NA. This is how it should look:

ID  Cycle  Cycle_Day Cycle_Date  Positive_Test_Date LH_Date
1   1      1         3/28/2019   NA                 NA 
1   1      2         3/29/2019   NA                 3/29/2019
1   1      3         3/30/2019   NA                 NA
1   1      NA        NA          3/29/2019          NA
1   2      1         4/23/2019   NA                 NA
1   2      2         4/24/2019   NA                 NA
1   2      3         4/25/2019   NA                 4/25/2019
1   2      NA        NA          4/25/2019          NA
2   1      1         3/18/2019   NA                 3/18/2019
2   1      2         3/19/2019   NA                 NA
2   1      3         3/20/2019   NA                 NA 
2   1      NA        NA          3/18/2019          NA
2   2      1         4/23/2019   NA                 NA
2   2      2         4/24/2019   NA                 4/24/2019
2   2      3         4/25/2019   NA                 NA
2   2      NA        NA          4/24/2019          NA

chinsoon12 · Accepted Answer

Another option is to use indexing to find the rows that fits the condition and update those rows only:

#for each group of ID and Cycle, 
#find the row indices where Cycle_Date equals the last Positive_Test_Date 
idxDT <- DT[, .I[Cycle_Date==Positive_Test_Date[.N]], .(ID, Cycle)]

#for those row indices, set the LH_Date to be Cycle_Date 
#(NA rows or excluded rows defaults to NA by design in data.table)
DT[idxDT$V1, LH_Date := Cycle_Date]

idxDT looks like this and idxDT$V1 extracts the column V1:

   ID Cycle V1
1:  1     1  2
2:  1     1 NA
3:  1     2  7
4:  1     2 NA
5:  2     1  9
6:  2     1 NA
7:  2     2 14
8:  2     2 NA

.I contains the row index within a data.table. From ?.I:

.I is an integer vector equal to seq_len(nrow(x)). While grouping, it holds for each item in the group, its row location in x. This is useful to subset in j; e.g. DT[, .I[which.max(somecol)], by=grp].

output:

    ID Cycle Cycle_Day Cycle_Date Positive_Test_Date   LH_Date
 1:  1     1         1  3/28/2019                     
 2:  1     1         2  3/29/2019                3/29/2019
 3:  1     1         3  3/30/2019                     
 4:  1     1        NA                 3/29/2019      
 5:  1     2         1  4/23/2019                     
 6:  1     2         2  4/24/2019                     
 7:  1     2         3  4/25/2019                4/25/2019
 8:  1     2        NA                 4/25/2019      
 9:  2     1         1  3/18/2019                3/18/2019
10:  2     1         2  3/19/2019                     
11:  2     1         3  3/20/2019                     
12:  2     1        NA                 3/18/2019      
13:  2     2         1  4/23/2019                     
14:  2     2         2  4/24/2019                4/24/2019
15:  2     2         3  4/25/2019                     
16:  2     2        NA                 4/24/2019

data:

library(data.table)
DT <- fread("ID  Cycle  Cycle_Day Cycle_Date  Positive_Test_Date
1   1      1         3/28/2019   NA
1   1      2         3/29/2019   NA
1   1      3         3/30/2019   NA
1   1      NA        NA          3/29/2019
1   2      1         4/23/2019   NA 
1   2      2         4/24/2019   NA
1   2      3         4/25/2019   NA
1   2      NA        NA          4/25/2019
2   1      1         3/18/2019   NA
2   1      2         3/19/2019   NA
2   1      3         3/20/2019   NA
2   1      NA        NA          3/18/2019
2   2      1         4/23/2019   NA 
2   2      2         4/24/2019   NA
2   2      3         4/25/2019   NA
2   2      NA        NA          4/24/2019")

How to create a new column in data.table based on values of other columns

Answers (2)

Related Questions