Reputation: 65
I am new to R. I was hoping to replace the missing values for X in the data. How can I replace the missing values of "X" when "Time" = 1 or 2 with the value of "X" when "Time" = 3 for the same "SubID" and the same "Day"
SubID: subject number
Day: each subject's day number (1,2,3...21)
Time: morning marked as 1, afternoon marked as 2, and evening marked as 3
X: only has a valid value when Time is 3, others are missing.
SubID Day Time X
1 1 1 NA
1 1 2 NA
1 1 3 7.4
1 2 1 NA
1 2 3 6.2
2 1 1 NA
2 1 2 NA
2 1 3 7.1
2 2 3 5.9
2 2 2 NA
2 2 1 NA
I was able to go as far as the following codes in zoo. I have very limited experience in R. Thank you in advance!
data2 <- transform(data1,
x = na.aggregate(x,by=SubID,FUN=sum,na.rm = T))
Upvotes: 0
Views: 913
Reputation: 1312
Here's the explanation of my comment:
library(data.table)
library(zoo)
setDT(data1)
data1[order(-Time),
Xf := na.locf(X),
by = .(SubID, Day)]
Ok so the setDT
function makes the data1
object a data.table
. Then order(-Time)
orders data1
with respect to Time
in descending order (because of the -
). Xf := na.locf(X)
creates a new column Xf
by reference (which means you don't have to assign this back to data1
) as na.locf(X)
which is a function in the zoo
package that fills the NAs forward with the previous value (in this case filling 2 and 1 with the value in 3). The last line specifies that we want to do this grouped by SubID
and Day
.
Hope it's clearer now, feel free to ask if you have further doubts.
Upvotes: 1
Reputation: 2026
You can sort the data by descending time and then use X[1]
.
library(dplyr)
df <- tibble(SubID=1, Day=1, Time=c(1,2,3), X=c(NA, NA, 2.2))
df <- df %>%
group_by(SubID, Day) %>%
arrange(desc(Time)) %>%
mutate(
X=case_when(
is.na(X) ~ X[1],
TRUE ~ X)
)
Upvotes: 1