Amer
Amer

Reputation: 2251

ID numbering carried forward based on condition in a data frame column

I have a data frame with the structure below (df). I want to modify ID2, PERIOD such that the ID2, PERIOD number is carried forward until AMT > 0 (Note when AMT > 0, the EVID is 0). So, the first two rows of ID2==13/PERIOD==2 should actually be ID2==12/PERIOD==1 as appears in the dfout data frame. Ans so on.

df <- 
ID   ID2   TIME   DVID    AMT   DV    PERIOD
 1    12     0      0      50    NA    1 
 1    12     0.5    1      0     10    1
 1    12     0.5    2      0     15    1
 1    13     600    1      0     2.5   2
 1    13     600    2      0     4     2
 1    13     600    0      100   NA    2
 1    13     602    1      0     20    2
 1    13     602    2      0     35    2
 1    14     800    1      0     5     3
 1    14     800    2      0     10    3
 1    14     800    0      50    NA    3


dfout <- 
ID   ID2   TIME   DVID    AMT   DV    PERIOD
 1    12     0      0      50    NA    1 
 1    12     0.5    1      0     10    1
 1    12     0.5    2      0     15    1
 1    12     600    1      0     2.5   1
 1    12     600    2      0     4     1
 1    13     600    0      100   NA    2
 1    13     602    1      0     20    2
 1    13     602    2      0     35    2
 1    13     800    1      0     5     2
 1    13     800    2      0     10    2
 1    14     800    0      50    NA    3

Is there a trick on how to do this in R?

Upvotes: 1

Views: 29

Answers (1)

akrun
akrun

Reputation: 887881

We can loop over the columns ("ID2", "PERIOD"), using the cumsum of the logical vector (DVID==0 & AMT > 0) to create a grouping index, we change the unique values.

df[c("ID2", "PERIOD")] <- lapply(df[c("ID2", "PERIOD")], function(x) 
               unique(x)[with(df, cumsum(DVID==0 & AMT > 0))])
df
#   ID ID2  TIME DVID AMT   DV PERIOD
#1   1  12   0.0    0  50   NA      1
#2   1  12   0.5    1   0 10.0      1
#3   1  12   0.5    2   0 15.0      1
#4   1  12 600.0    1   0  2.5      1
#5   1  12 600.0    2   0  4.0      1
#6   1  13 600.0    0 100   NA      2
#7   1  13 602.0    1   0 20.0      2
#8   1  13 602.0    2   0 35.0      2
#9   1  13 800.0    1   0  5.0      2
#10  1  13 800.0    2   0 10.0      2
#11  1  14 800.0    0  50   NA      3

Or using dplyr

library(dplyr)
df %>% 
   mutate_each(funs(unique(.)[cumsum(DVID==0 & AMT > 0)]), DV, PERIOD)

Upvotes: 2

Related Questions