TiberiusGracchus2020
TiberiusGracchus2020

Reputation: 409

Fill NA values in one data table with observed values from a second data table in R

I can't believe I'm having this much trouble finding a solution to this problem: I have two data tables with identical rows and columns that look like this:

Country <- c("FRA", "FRA", "DEU", "DEU", "CHE", "CHE")
Year <- c(2010, 2020, 2010, 2020, 2010, 2020)
acctm <- c(20, 30, 10, NA, 20, NA)
acctf <- c(20, NA, 15, NA, 40, NA)

dt1 <- data.table(Country, Year, acctm, acctf)

   Country Year acctm acctf
1      FRA 2010    20    20
2      FRA 2020    30    NA
3      DEU 2010    10    15
4      DEU 2020    NA    NA
5      CHE 2010    20    40
6      CHE 2020    NA    NA

Country <- c("FRA", "FRA", "DEU", "DEU", "CHE", "CHE")
Year <- c(2010, 2020, 2010, 2020, 2010, 2020)
acctm <- c(1, 1, 1, 60, 1, 70)
acctf <- c(1, 60, 1, 80, 1, 100)

dt2 <- data.table(Country, Year, acctm, acctf)

   Country Year acctm acctf
1      FRA 2010    1     1
2      FRA 2020    2    60
3      DEU 2010    1     1
4      DEU 2020    60   80
5      CHE 2010    1     2
6      CHE 2020    70  100

I need to create a new data table that replaces NA values in dt1 with values for the corresponding country/year/variable match from dt2, yielding a table that looks like this:

   Country Year acctm acctf
1      FRA 2010    20    20
2      FRA 2020    30    60
3      DEU 2010    10    15
4      DEU 2020    60    80
5      CHE 2010    20    40
6      CHE 2020    70   100

Upvotes: 2

Views: 262

Answers (1)

akrun
akrun

Reputation: 887251

We can do this with a join on the 'Country', 'Year' columns

library(data.table)
nm1 <- names(dt1)[3:4]
nm2 <- paste0("i.", nm1)
dt3 <- copy(dt1)
dt3[dt2, (nm1) := Map(function(x, y) 
   fifelse(is.na(x), y, x), mget(nm1), mget(nm2)), on = .(Country, Year)]
dt3
#   Country Year acctm acctf
#1:     FRA 2010    20    20
#2:     FRA 2020    30    60
#3:     DEU 2010    10    15
#4:     DEU 2020    60    80
#5:     CHE 2010    20    40
#6:     CHE 2020    70   100

Or to make this compact, use fcoalesce from data.table (comments from @IceCreamToucan)

dt3[dt2,  (nm1) := Map(fcoalesce, mget(nm1), mget(nm2)), on = .(Country, Year)]

If the datasets are of same dimensions and have the same values for 'Country', 'Year', then another option is

library(purrr)
library(dplyr)
list(dt1[, .(acctm, acctf)], dt2[, .(acctm, acctf)]) %>% 
      reduce(coalesce) %>%
      bind_cols(dt1[, .(Country, Year)], .)

Upvotes: 3

Related Questions