Reputation: 1325
I have two data tables with similar structure (same number of rows same variables). I'd like to update the value in the first data frame using the values in the second, unless the values are undefined. Here is a reproducible example (using data.table, but the idea is the same):
library(data.table)
DT1 <- data.table(Categ = c("A", "B", "C"),
x = c(12,50,34),
y= c(5,2,8))
DT2 <- data.table(Categ = c("A", "B", "C"),
x = c(30,NA,NA),
y= c(1,NA,6))
My solution is rather simple, but is not scalable at all - if new columns are added, the code must be ammended. Also, it requires one row of code per column:
setnames(DT2, c("x","y"), c("x_new","y_new"))
DT3 <- merge(DT1,DT2,by='Categ')
DT3[!is.na(x_new), x:=x_new]
DT3[!is.na(y_new), y:=y_new]
Expected output, just to make sure I explained this right:
Categ x y
A 30 1
B 50 2
C 34 6
Upvotes: 2
Views: 100
Reputation: 118799
Of course we need to identify which rows for each column has NAs. There's no getting around it AFAICT. With that in mind, this is what was able to think of (a variation of @akrun's solution really):
# get DT1's matching indices for each row of DT2, handle multiple matches as well
idx = DT1[DT2, which = TRUE, on = "Categ", mult = "first"]
for (col in c("x", "y")) {
nas = which(is.na(DT2[[col]]))
this_idx = idx[nas]
set(DT2, i = nas, j = col, value = DT1[[col]][this_idx])
}
This assumes identical column names in both data tables.
Upvotes: 4
Reputation: 24188
Using data.frame
's the following approach is possible:
# Convert to df
DF1 <- data.frame(DT1)
DF2 <- data.frame(DT2)
# Extract indices of NA values
ind <- which(is.na(DF2), arr.ind = T)
# Replace
DF2[ind] <- DF1[ind]
DF2
# Categ x y
#1 A 30 1
#2 B 50 2
#3 C 34 6
Upvotes: 2