MC26
MC26

Reputation: 121

R: Add new rows to a dataframe when a condition is met and then repeat the final value

I have a dataframe with repeated measurements for individuals

ID    Day    Value
1      1     1
1      3     NA
1      4     3
1      5     5

After subjects hit a certain value (5 or 0.5) they disappear from the dataset. However, I want to add rows for those patients and repeat the last value until day 60

Expected output:

ID    Day    Value
1      1     1
1      3     NA
1      4     3
1      5     5
1      6     5
1      7     5
1      ...   5
1      60    5

There are many subjects and the same needs to happen to each of them, so when value = 5, repeat 5 till day 60 when value = 0.5, repeat 0.5 till day 60

Some subjects hit the 5 or 0.5 after 1 day, others after 59 days, and everything in between. There are also a lot of missing values, and if they are before a subject hits either 5 or 0.5 they need to stay NA.

I have no clue how to do this, I've been looking online but couldnt find this question, apologies if it is a duplicate.

If anything is unclear, please let me know, Thanks!

Upvotes: 0

Views: 872

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269471

Using the reproducible data in the Note at the end where nday represents the day number up to which we want -- we use 7 here but you can change it to 60. Create a function which performs the calculation for one ID. Then apply it to each ID and rbind the results together. Note that if mx already equals nday then since there is no else leg on the if, the value of more will be NULL and rbind with a NULL just returns the data as desired. No packages are used.

add_data <- function(data) {
  mx = max(data$Day)
  more <- if (mx < nday) {
    with(data, data.frame(ID = ID[1], Day = seq(mx+1, nday), Value = tail(Value, 1)))
  }
  rbind(data, more)
}
do.call("rbind", by(DF, DF$ID, add_data))

giving:

    ID Day Value
1.1  1   1     1
1.2  1   3    NA
1.3  1   4     3
1.4  1   5     5
1.5  1   6     5
1.6  1   7     5
2.5  2   1     1
2.6  2   3    NA
2.7  2   4     3
2.8  2   5     5
2.1  2   6     5
2.2  2   7     5

Note

DF <- data.frame(ID = rep(1:2, c(4, 4)), Day = c(1, 3, 4, 5), Value = c(1, NA, 3, 5))
nday <- 7 # number of days up to which we need; replace with 60

Upvotes: 2

Related Questions