Susan
Susan

Reputation: 65

R replacing missing values with na.locf

I am new to R. I was hoping to replace the missing values for X in the data. How can I replace the missing values of "X" when "Time" = 1 and 2 with the value of "X" when "Time" = 3 for the same "ID" and the same "Day"

X: only has a valid value when Time is 3, others are missing.

 ID   Day  Time   X    
 1    1     1     NA
 1    1     2     NA
 1    1     3     7.4
 1    2     1     NA
 1    2     3     6.2
 2    1     1     NA
 2    1     2     NA
 2    1     3     7.1
 2    2     3     5.9
 2    2     2     NA
 2    2     1     NA

I was able to write and run the following codes with the package zoo and data.table, but afterward when I checked the data, it didn't work--the X value still has all the missing values when time=1 and 2. Anything wrong with my code? Any suggestions? I have very limited experience in R. Thank you in advance!

setDT(data1) data1 <- data1 [order(-Time), X := na.locf(X), by = .(ID, Day)]

Upvotes: 1

Views: 312

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269526

In the example in the question each group has at most one non-missing value and in every case that value is to be assigned to every value in that group so if that is the general case the rule reduces to assigning the single non-NA value to all X values in the group:

data1[, X := na.omit(X), .(ID, Day)]

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388962

We can use replace :

setDT(data1)
data1[, X := replace(X, is.na(X), X[Time == 3]), .(ID, Day)]

#    ID Day Time   X
# 1:  1   1    1 7.4
# 2:  1   1    2 7.4
# 3:  1   1    3 7.4
# 4:  1   2    1 6.2
# 5:  1   2    3 6.2
# 6:  2   1    1 7.1
# 7:  2   1    2 7.1
# 8:  2   1    3 7.1
# 9:  2   2    3 5.9
#10:  2   2    2 5.9
#11:  2   2    1 5.9

Similarly using dplyr.

library(dplyr)
data1 %>% group_by(ID, Day) %>% mutate(X =replace(X, is.na(X), X[Time == 3]))

If you have more than 1 value in a group where Time == 3 use which.max

data1[, X := replace(X, is.na(X), X[which.max(Time == 3)]), .(ID, Day)]

When there is no value for Time == 3 we can use :

data1[, X := if(any(Time == 3)) replace(X, is.na(X), X[which.max(Time == 3)]), .(ID, Day)]

Upvotes: 1

chinsoon12
chinsoon12

Reputation: 25225

Here are 2 options:

1) Using rolling join in data.table:

DT[, X1 :=
    DT[!is.na(X)][.SD, on=.(ID, Day, Time), roll=-Inf, x.X]
]

2) using data.table::nafill with "locf" by ID and Day after sorting by ID, Day and descending Time:

DT[order(ID, Day, -Time), X2 := nafill(X, "locf"), .(ID, Day)]

output:

    ID Day Time   X  X1  X2
 1:  1   1    1  NA 7.4 7.4
 2:  1   1    2  NA 7.4 7.4
 3:  1   1    3 7.4 7.4 7.4
 4:  1   2    1  NA 6.2 6.2
 5:  1   2    3 6.2 6.2 6.2
 6:  2   1    1  NA 7.1 7.1
 7:  2   1    2  NA 7.1 7.1
 8:  2   1    3 7.1 7.1 7.1
 9:  2   2    3 5.9 5.9 5.9
10:  2   2    2  NA 5.9 5.9
11:  2   2    1  NA 5.9 5.9
12:  2   3    1  NA  NA  NA
13:  2   3    2  NA  NA  NA
14:  2   3    3  NA  NA  NA

data:

library(data.table)
DT <- fread("ID   Day  Time   X    
1    1     1     NA
1    1     2     NA
1    1     3     7.4
1    2     1     NA
1    2     3     6.2
2    1     1     NA
2    1     2     NA
2    1     3     7.1
2    2     3     5.9
2    2     2     NA
2    2     1     NA
2 3 1 NA
2 3 2 NA
2 3 3 NA")

Upvotes: 2

Related Questions