Reputation: 121
I have a dataframe with repeated measurements for individuals
ID Day Value
1 1 1
1 3 NA
1 4 3
1 5 5
After subjects hit a certain value (5 or 0.5) they disappear from the dataset. However, I want to add rows for those patients and repeat the last value until day 60
Expected output:
ID Day Value
1 1 1
1 3 NA
1 4 3
1 5 5
1 6 5
1 7 5
1 ... 5
1 60 5
There are many subjects and the same needs to happen to each of them, so when value = 5, repeat 5 till day 60 when value = 0.5, repeat 0.5 till day 60
Some subjects hit the 5 or 0.5 after 1 day, others after 59 days, and everything in between. There are also a lot of missing values, and if they are before a subject hits either 5 or 0.5 they need to stay NA.
I have no clue how to do this, I've been looking online but couldnt find this question, apologies if it is a duplicate.
If anything is unclear, please let me know, Thanks!
Upvotes: 0
Views: 872
Reputation: 269471
Using the reproducible data in the Note at the end where nday
represents the day number up to which we want -- we use 7 here but you can change it to 60. Create a function which performs the calculation for one ID. Then apply it to each ID and rbind the results together. Note that if mx
already equals nday
then since there is no else
leg on the if
, the value of more
will be NULL and rbind
with a NULL just returns the data as desired. No packages are used.
add_data <- function(data) {
mx = max(data$Day)
more <- if (mx < nday) {
with(data, data.frame(ID = ID[1], Day = seq(mx+1, nday), Value = tail(Value, 1)))
}
rbind(data, more)
}
do.call("rbind", by(DF, DF$ID, add_data))
giving:
ID Day Value
1.1 1 1 1
1.2 1 3 NA
1.3 1 4 3
1.4 1 5 5
1.5 1 6 5
1.6 1 7 5
2.5 2 1 1
2.6 2 3 NA
2.7 2 4 3
2.8 2 5 5
2.1 2 6 5
2.2 2 7 5
DF <- data.frame(ID = rep(1:2, c(4, 4)), Day = c(1, 3, 4, 5), Value = c(1, NA, 3, 5))
nday <- 7 # number of days up to which we need; replace with 60
Upvotes: 2