swhusky
swhusky

Reputation: 305

lagging variables by day and creating new row in the process

I'm trying to lag variables by day but many don't have an observation on the previous day. So I need to add an extra row in the process. Dplyr gets me close but I need a way to add a new row in the process and have many thousands of cases. Any thoughts would be much appreciated.

ID<-c(1,1,1,1,2,2)
day<-c(0,1,2,5,1,3)
v<-c(2.2,3.4,1.2,.8,6.4,2)
dat1<-as.data.frame(cbind(ID,day,v))
dat1

  ID day   v
1  1   0 2.2
2  1   1 3.4
3  1   2 1.2
4  1   5 0.8
5  2   1 6.4
6  2   3 2.0

Using dplyr gets me here:

dat2<-
  dat1 %>%
  group_by(ID) %>%
  mutate(v.L = dplyr::lead(v, n = 1, default = NA))
dat2
     ID   day     v   v.L
1     1     0   2.2   3.4
2     1     1   3.4   1.2
3     1     2   1.2   0.8
4     1     5   0.8    NA
5     2     1   6.4   2.0
6     2     3   2.0    NA

But I need to get here:

 ID2<-c(1,1,1,1,1,2,2,2)
 day2<-c(0,1,2,4,5,1,2,3)
 v2<-c(2.2,3.4,1.2,NA,.8,6.4,NA,2)
 v2.L<-c(3.4,1.2,NA,.8,NA,NA,2,NA)
 dat3<-as.data.frame(cbind(ID2,day2,v2,v2.L))
 dat3

   ID2 day2  v2 v2.L
1   1    0 2.2  3.4
2   1    1 3.4  1.2
3   1    2 1.2   NA
4   1    4  NA  0.8
5   1    5 0.8   NA
6   2    1 6.4   NA
7   2    2  NA  2.0
8   2    3 2.0   NA

Upvotes: 1

Views: 258

Answers (1)

Lamia
Lamia

Reputation: 3875

You could use complete and full_seq from the tidyr package to complete the sequence of days. You'd need to remove at the end the rows that have NA in both v and v.L:

library(dplyr)
library(tidyr)

dat2 = dat1 %>%
       group_by(ID) %>%
       complete(day = full_seq(day,1)) %>%
       mutate(v.L = lead(v)) %>%
       filter(!(is.na(v) & is.na(v.L)))


    ID   day     v   v.L
  <dbl> <dbl> <dbl> <dbl>
     1     0   2.2   3.4
     1     1   3.4   1.2
     1     2   1.2    NA
     1     4    NA   0.8
     1     5   0.8    NA
     2     1   6.4    NA
     2     2    NA   2.0
     2     3   2.0    NA

Upvotes: 2

Related Questions