Merge two dataframes by time and variable ID with NAs

Question

Im currently working with some currencies in R and want to merge or override two datasets to create one.

For all the currencies, I have data from 1980 to 2017 in DF1. For 16 of them I also have data starting from different places between 1970 and 1975 to 2017 in DF2. What I want to do is to put the 1970-1980 part from DF2 on top of DF1. I guess I would achieve the same result if I managed to merge DF1 into DF2 for the cells where they both have values (so DF1 overrides DF2). However, the starting dates are not exactly equal for all currencies, so I can't just hardcode it.

Here is an example to show you what it looks like: Dateis the time variable (monthly data). DF1 would be corresponding to the 1980-2017 data and DF2 would correspond to the 1970-2017 data. My objective is to override the values from DF1 into DF2 based on both column and row ID. DF3 is the desired output, without the NAs from DF1, just the values from DF2 instead.

set.seed(1234)
DF1=data.frame(matrix(data=c(c(4:9),rnorm(30)),6,6))
set.seed(4321)
DF2=data.frame(matrix(data=c(c(1:12),rnorm(36)),12,4))
names(DF1)=c("Date","Currency1","Currency3","Currency6","Currency7","Currency8")
names(DF2)=c("Date","Currency1","Currency2","Currency3")
DF1$Currency3[1:2]=NA
DF1$Currency1[4:5]=NA

> DF1
  Date  Currency1  Currency3   Currency6  Currency7  Currency8
1    4 -1.2070657         NA -0.77625389 -0.8371717 -0.6937202
2    5  0.2774292         NA  0.06445882  2.4158352 -1.4482049
3    6  1.0844412 -0.5644520  0.95949406  0.1340882  0.5747557
4    7         NA -0.8900378 -0.11028549 -0.4906859 -1.0236557
5    8         NA -0.4771927 -0.51100951 -0.4405479 -0.0151383
6    9  0.5060559 -0.9983864 -0.91119542  0.4595894 -0.9359486

> DF2
   Date   Currency1    Currency2   Currency3
1     1 -0.42675738 -1.260985237 -0.09920208
2     2 -0.22361182  1.139464085 -0.23803425
3     3  0.71760679 -1.221781923  0.04778266
4     4  0.84144567  1.573315888  0.29651274
5     5 -0.12835727  0.073477874 -0.83380992
6     6  1.60934721 -1.175115087 -1.37397000
7     7 -0.29716745 -1.588261899  0.14027895
8     8  0.19600465 -0.747380729  0.66212596
9     9  1.24074620  0.483521864  1.13103967
10   10 -0.71869815 -0.003025539 -0.47511202
11   11 -0.06723632 -0.008930402  0.85241411
12   12  0.34436710  0.593357619 -0.75151885

I got this answer from a user, but I encountered a new issue, some of the data in DF1 has NAs in it, which this code overrides into DF2.

library(data.table)
DF3 <- copy(DF2)
nm1 <- names(DF1)[-1]
setDT(DF3)[DF1, (nm1) := mget(paste0("i.", nm1)), on = .(Date)]

    > DF3
    Date  Currency1  Currency2    Currency3   Currency6  Currency7  Currency8
 1:    1  1.1022975 -0.8553646 -0.162309524          NA         NA         NA
 2:    2 -0.4755931 -0.2806230  0.563055819          NA         NA         NA
 3:    3 -0.7094400 -0.9943401  1.647817473          NA         NA         NA
 4:    4 -1.2070657 -0.9685143           NA -0.77625389 -0.8371717 -0.6937202
 5:    5  0.2774292 -1.1073182           NA  0.06445882  2.4158352 -1.4482049
 6:    6  1.0844412 -1.2519859 -0.564451999  0.95949406  0.1340882  0.5747557
 7:    7         NA -0.5238281 -0.890037829 -0.11028549 -0.4906859 -1.0236557
 8:    8         NA -0.4968500 -0.477192700 -0.51100951 -0.4405479 -0.0151383
 9:    9  0.5060559 -1.8060313 -0.998386445 -0.91119542  0.4595894 -0.9359486
10:   10 -0.4658975 -0.5820759 -0.669633580          NA         NA         NA
11:   11  1.4494963 -1.1088896 -0.007604756          NA         NA         NA
12:   12 -1.0686427 -1.0149620  1.777084448          NA         NA         NA

Jaap · Accepted Answer

To replace only the values in DF3 that have non-missing counterparts in DF1 you could use ifelse.

Using:

DF3 <- copy(DF2)
nm1 <- names(DF1)[-1]
nm2 <- names(DF2)
setDT(DF3)[DF1, (nm1) := {n <- seq_along(nm1);
                          lapply(n, function(i) ifelse(is.na(get(paste0("i.", nm1[i]))) & nm1[i] %in% nm2, 
                                                       get(paste0("x.", nm1[i])), 
                                                       get(paste0("i.", nm1[i]))))},
           on = .(Date)]

you get:

> DF3
    Date   Currency1    Currency2   Currency3   Currency6  Currency7  Currency8
 1:    1 -0.42675738 -1.260985237 -0.09920208          NA         NA         NA
 2:    2 -0.22361182  1.139464085 -0.23803425          NA         NA         NA
 3:    3  0.71760679 -1.221781923  0.04778266          NA         NA         NA
 4:    4 -1.20706575  1.573315888  0.29651274 -0.77625389 -0.8371717 -0.6937202
 5:    5  0.27742924  0.073477874 -0.83380992  0.06445882  2.4158352 -1.4482049
 6:    6  1.08444118 -1.175115087 -0.56445200  0.95949406  0.1340882  0.5747557
 7:    7 -0.29716745 -1.588261899 -0.89003783 -0.11028549 -0.4906859 -1.0236557
 8:    8  0.19600465 -0.747380729 -0.47719270 -0.51100951 -0.4405479 -0.0151383
 9:    9  0.50605589  0.483521864 -0.99838644 -0.91119542         NA -0.9359486
10:   10 -0.71869815 -0.003025539 -0.47511202          NA         NA         NA
11:   11 -0.06723632 -0.008930402  0.85241411          NA         NA         NA
12:   12  0.34436710  0.593357619 -0.75151885          NA         NA         NA

Used data:

set.seed(1234)
DF1=data.frame(matrix(data=c(c(4:9),rnorm(30)),6,6))
set.seed(4321)
DF2=data.frame(matrix(data=c(c(1:12),rnorm(36)),12,4))
names(DF1)=c("Date","Currency1","Currency3","Currency6","Currency7","Currency8")
names(DF2)=c("Date","Currency1","Currency2","Currency3")
DF1$Currency3[1:2]=NA
DF1$Currency1[4:5]=NA
DF1$Currency7[nrow(DF1)]=NA

Merge two dataframes by time and variable ID with NAs

Answers (2)

Related Questions