Reputation: 65
I am new to R. I was hoping to replace the missing values for X in the data. How can I replace the missing values of "X" when "Time" = 1 and 2 with the value of "X" when "Time" = 3 for the same "ID" and the same "Day"
X: only has a valid value when Time is 3, others are missing.
ID Day Time X
1 1 1 NA
1 1 2 NA
1 1 3 7.4
1 2 1 NA
1 2 3 6.2
2 1 1 NA
2 1 2 NA
2 1 3 7.1
2 2 3 5.9
2 2 2 NA
2 2 1 NA
I was able to write and run the following codes with the package zoo and data.table, but afterward when I checked the data, it didn't work--the X value still has all the missing values when time=1 and 2. Anything wrong with my code? Any suggestions? I have very limited experience in R. Thank you in advance!
setDT(data1)
data1 <- data1 [order(-Time),
X := na.locf(X),
by = .(ID, Day)]
Upvotes: 1
Views: 312
Reputation: 269526
In the example in the question each group has at most one non-missing value and in every case that value is to be assigned to every value in that group so if that is the general case the rule reduces to assigning the single non-NA value to all X values in the group:
data1[, X := na.omit(X), .(ID, Day)]
Upvotes: 1
Reputation: 388962
We can use replace
:
setDT(data1)
data1[, X := replace(X, is.na(X), X[Time == 3]), .(ID, Day)]
# ID Day Time X
# 1: 1 1 1 7.4
# 2: 1 1 2 7.4
# 3: 1 1 3 7.4
# 4: 1 2 1 6.2
# 5: 1 2 3 6.2
# 6: 2 1 1 7.1
# 7: 2 1 2 7.1
# 8: 2 1 3 7.1
# 9: 2 2 3 5.9
#10: 2 2 2 5.9
#11: 2 2 1 5.9
Similarly using dplyr
.
library(dplyr)
data1 %>% group_by(ID, Day) %>% mutate(X =replace(X, is.na(X), X[Time == 3]))
If you have more than 1 value in a group where Time == 3
use which.max
data1[, X := replace(X, is.na(X), X[which.max(Time == 3)]), .(ID, Day)]
When there is no value for Time == 3
we can use :
data1[, X := if(any(Time == 3)) replace(X, is.na(X), X[which.max(Time == 3)]), .(ID, Day)]
Upvotes: 1
Reputation: 25225
Here are 2 options:
1) Using rolling join in data.table
:
DT[, X1 :=
DT[!is.na(X)][.SD, on=.(ID, Day, Time), roll=-Inf, x.X]
]
2) using data.table::nafill
with "locf" by ID and Day after sorting by ID, Day and descending Time:
DT[order(ID, Day, -Time), X2 := nafill(X, "locf"), .(ID, Day)]
output:
ID Day Time X X1 X2
1: 1 1 1 NA 7.4 7.4
2: 1 1 2 NA 7.4 7.4
3: 1 1 3 7.4 7.4 7.4
4: 1 2 1 NA 6.2 6.2
5: 1 2 3 6.2 6.2 6.2
6: 2 1 1 NA 7.1 7.1
7: 2 1 2 NA 7.1 7.1
8: 2 1 3 7.1 7.1 7.1
9: 2 2 3 5.9 5.9 5.9
10: 2 2 2 NA 5.9 5.9
11: 2 2 1 NA 5.9 5.9
12: 2 3 1 NA NA NA
13: 2 3 2 NA NA NA
14: 2 3 3 NA NA NA
data:
library(data.table)
DT <- fread("ID Day Time X
1 1 1 NA
1 1 2 NA
1 1 3 7.4
1 2 1 NA
1 2 3 6.2
2 1 1 NA
2 1 2 NA
2 1 3 7.1
2 2 3 5.9
2 2 2 NA
2 2 1 NA
2 3 1 NA
2 3 2 NA
2 3 3 NA")
Upvotes: 2