Update data frame / table from data with similar structure

Question

I have two data tables with similar structure (same number of rows same variables). I'd like to update the value in the first data frame using the values in the second, unless the values are undefined. Here is a reproducible example (using data.table, but the idea is the same):

library(data.table)

DT1 <- data.table(Categ = c("A", "B", "C"),
                  x = c(12,50,34),
                  y= c(5,2,8))

DT2 <- data.table(Categ = c("A", "B", "C"),
                  x = c(30,NA,NA),
                  y= c(1,NA,6))

My solution is rather simple, but is not scalable at all - if new columns are added, the code must be ammended. Also, it requires one row of code per column:

setnames(DT2, c("x","y"), c("x_new","y_new"))
DT3 <- merge(DT1,DT2,by='Categ')

DT3[!is.na(x_new), x:=x_new]
DT3[!is.na(y_new), y:=y_new]

Expected output, just to make sure I explained this right:

Categ    x  y
    A   30  1
    B   50  2
    C   34  6

Arun · Accepted Answer

Of course we need to identify which rows for each column has NAs. There's no getting around it AFAICT. With that in mind, this is what was able to think of (a variation of @akrun's solution really):

# get DT1's matching indices for each row of DT2, handle multiple matches as well
idx = DT1[DT2, which = TRUE, on = "Categ", mult = "first"]
for (col in c("x", "y")) {
    nas = which(is.na(DT2[[col]]))
    this_idx = idx[nas]
    set(DT2, i = nas, j = col, value = DT1[[col]][this_idx])
}

This assumes identical column names in both data tables.

Update data frame / table from data with similar structure

Answers (2)

Related Questions