Reputation: 305
I'm trying to lag variables by day but many don't have an observation on the previous day. So I need to add an extra row in the process. Dplyr gets me close but I need a way to add a new row in the process and have many thousands of cases. Any thoughts would be much appreciated.
ID<-c(1,1,1,1,2,2)
day<-c(0,1,2,5,1,3)
v<-c(2.2,3.4,1.2,.8,6.4,2)
dat1<-as.data.frame(cbind(ID,day,v))
dat1
ID day v
1 1 0 2.2
2 1 1 3.4
3 1 2 1.2
4 1 5 0.8
5 2 1 6.4
6 2 3 2.0
Using dplyr gets me here:
dat2<-
dat1 %>%
group_by(ID) %>%
mutate(v.L = dplyr::lead(v, n = 1, default = NA))
dat2
ID day v v.L
1 1 0 2.2 3.4
2 1 1 3.4 1.2
3 1 2 1.2 0.8
4 1 5 0.8 NA
5 2 1 6.4 2.0
6 2 3 2.0 NA
But I need to get here:
ID2<-c(1,1,1,1,1,2,2,2)
day2<-c(0,1,2,4,5,1,2,3)
v2<-c(2.2,3.4,1.2,NA,.8,6.4,NA,2)
v2.L<-c(3.4,1.2,NA,.8,NA,NA,2,NA)
dat3<-as.data.frame(cbind(ID2,day2,v2,v2.L))
dat3
ID2 day2 v2 v2.L
1 1 0 2.2 3.4
2 1 1 3.4 1.2
3 1 2 1.2 NA
4 1 4 NA 0.8
5 1 5 0.8 NA
6 2 1 6.4 NA
7 2 2 NA 2.0
8 2 3 2.0 NA
Upvotes: 1
Views: 258
Reputation: 3875
You could use complete
and full_seq
from the tidyr
package to complete the sequence of days. You'd need to remove at the end the rows that have NA
in both v
and v.L
:
library(dplyr)
library(tidyr)
dat2 = dat1 %>%
group_by(ID) %>%
complete(day = full_seq(day,1)) %>%
mutate(v.L = lead(v)) %>%
filter(!(is.na(v) & is.na(v.L)))
ID day v v.L
<dbl> <dbl> <dbl> <dbl>
1 0 2.2 3.4
1 1 3.4 1.2
1 2 1.2 NA
1 4 NA 0.8
1 5 0.8 NA
2 1 6.4 NA
2 2 NA 2.0
2 3 2.0 NA
Upvotes: 2