Replacing missing values in time series data in R

Question

I am new to R. I was hoping to replace the missing values for X in the data. How can I replace the missing values of "X" when "Time" = 1 or 2 with the value of "X" when "Time" = 3 for the same "SubID" and the same "Day"

SubID: subject number

Day: each subject's day number (1,2,3...21)

Time: morning marked as 1, afternoon marked as 2, and evening marked as 3

X: only has a valid value when Time is 3, others are missing.

SubID Day  Time   X    
 1    1     1     NA
 1    1     2     NA
 1    1     3     7.4
 1    2     1     NA
 1    2     3     6.2
 2    1     1     NA
 2    1     2     NA
 2    1     3     7.1
 2    2     3     5.9
 2    2     2     NA
 2    2     1     NA

I was able to go as far as the following codes in zoo. I have very limited experience in R. Thank you in advance!

data2 <- transform(data1, x = na.aggregate(x,by=SubID,FUN=sum,na.rm = T))

josemz · Accepted Answer

Here's the explanation of my comment:

library(data.table)
library(zoo)
setDT(data1)
data1[order(-Time),
      Xf := na.locf(X),
      by = .(SubID, Day)]

Ok so the setDT function makes the data1 object a data.table. Then order(-Time) orders data1 with respect to Time in descending order (because of the -). Xf := na.locf(X) creates a new column Xf by reference (which means you don't have to assign this back to data1) as na.locf(X) which is a function in the zoo package that fills the NAs forward with the previous value (in this case filling 2 and 1 with the value in 3). The last line specifies that we want to do this grouped by SubID and Day.

Hope it's clearer now, feel free to ask if you have further doubts.

Replacing missing values in time series data in R

Answers (2)

Related Questions