Filling missing rows two data frames

Question

I have two large data sets. One is an old one and the second one is about the same as the first one. The difference is that the second one has new rows, updated Value and missing rows compared to the first data set. I would like to keep everything in the new data set and fill it with the missing rows (combinations of Date and Code) that are present in the old data set. The order is not important.

Old data set:

              Date Code Value 
        2015-10-01   1   145
        2015-10-01   1   175 
        2015-11-01   6   112 
        2015-12-01   2   160 
        2016-01-01   6   124
        2016-01-01   6   572
        2016-02-01   5   160 
        2016-02-01   1   574

New data set:

              Date Code Value 
        2015-10-01   1   145
        2015-10-01   2   1452
        2015-11-01   6   125 
        2015-12-01   2   160 
        2016-01-01   6   1501
        2016-01-01   6   572
        2016-03-01   9   452
        2016-03-01   7   500

Output:

              Date Code Value 
        2015-10-01   1   145
        2015-10-01   2   1452
        2015-11-01   6   125 
        2015-12-01   2   160 
        2016-01-01   6   1501
        2016-01-01   6   572
        2016-03-01   9   452
        2016-03-01   7   500
        2015-10-01   1   175 
        2016-02-01   5   160 
        2016-02-01   1   574

When there is no matching combination of Date and Code the corresponding row from the old data set should be added. In the output, the last three rows come from the old data set. I have looked at different posts without luck to find what I need.

ArunK · Accepted Answer

You can use the anti_join function from the dplyr library to find all the rows in the old_df that doesn't exist in new_df

df <- anti_join(old_df,new_df,by=c("date","code","value"))
        date code value
1 2016-01-01    6   124
2 2016-02-01    1   574
3 2016-02-01    5   160
4 2015-10-01    1   175
5 2015-11-01    6   112
final_df <- full_join(df,new_df,by=c("date","code","value"))
         date code value
1  2016-01-01    6   124
2  2016-02-01    1   574
3  2016-02-01    5   160
4  2015-10-01    1   175
5  2015-11-01    6   112
6  2015-10-01    1   145
7  2015-10-01    2  1452
8  2015-11-01    6   125
9  2015-12-01    2   160
10 2016-01-01    6  1501
11 2016-01-01    6   572
12 2016-03-01    9   452
13 2016-03-01    7   500

Filling missing rows two data frames

Answers (2)

Related Questions