noblabla
noblabla

Reputation: 113

Filling missing rows two data frames

I have two large data sets. One is an old one and the second one is about the same as the first one. The difference is that the second one has new rows, updated Value and missing rows compared to the first data set. I would like to keep everything in the new data set and fill it with the missing rows (combinations of Date and Code) that are present in the old data set. The order is not important.

Old data set:

              Date Code Value 
        2015-10-01   1   145
        2015-10-01   1   175 
        2015-11-01   6   112 
        2015-12-01   2   160 
        2016-01-01   6   124
        2016-01-01   6   572
        2016-02-01   5   160 
        2016-02-01   1   574

New data set:

              Date Code Value 
        2015-10-01   1   145
        2015-10-01   2   1452
        2015-11-01   6   125 
        2015-12-01   2   160 
        2016-01-01   6   1501
        2016-01-01   6   572
        2016-03-01   9   452
        2016-03-01   7   500

Output:

              Date Code Value 
        2015-10-01   1   145
        2015-10-01   2   1452
        2015-11-01   6   125 
        2015-12-01   2   160 
        2016-01-01   6   1501
        2016-01-01   6   572
        2016-03-01   9   452
        2016-03-01   7   500
        2015-10-01   1   175 
        2016-02-01   5   160 
        2016-02-01   1   574

When there is no matching combination of Date and Code the corresponding row from the old data set should be added. In the output, the last three rows come from the old data set. I have looked at different posts without luck to find what I need.

Upvotes: 2

Views: 62

Answers (2)

ArunK
ArunK

Reputation: 1781

You can use the anti_join function from the dplyr library to find all the rows in the old_df that doesn't exist in new_df

df <- anti_join(old_df,new_df,by=c("date","code","value"))
        date code value
1 2016-01-01    6   124
2 2016-02-01    1   574
3 2016-02-01    5   160
4 2015-10-01    1   175
5 2015-11-01    6   112
final_df <- full_join(df,new_df,by=c("date","code","value"))
         date code value
1  2016-01-01    6   124
2  2016-02-01    1   574
3  2016-02-01    5   160
4  2015-10-01    1   175
5  2015-11-01    6   112
6  2015-10-01    1   145
7  2015-10-01    2  1452
8  2015-11-01    6   125
9  2015-12-01    2   160
10 2016-01-01    6  1501
11 2016-01-01    6   572
12 2016-03-01    9   452
13 2016-03-01    7   500

Upvotes: 0

Ernest A
Ernest A

Reputation: 7839

You can do this

key.new <- with(newds, interaction(Date, Code, Value))
key.old <- with(oldds, interaction(Date, Code, Value))

then

ind <- match(key.old, key.new, nomatch=0) == 0

gives you the rows from the old data set that should be added to the new one, if I understand the question correctly.

> oldds[ind, ]
        Date Code Value
2 2015-10-01    1   175
3 2015-11-01    6   112
5 2016-01-01    6   124
7 2016-02-01    5   160
8 2016-02-01    1   574

As an aside note, it's recommended that you provide a minimal reproducible example, including the commands to recreate the data that you're working with e.g. the output of dput(oldds) and dput(newds) or

oldds <-
read.table(text='
          Date Code Value 
    2015-10-01    1   145
    2015-10-01    1   175 
    2015-11-01    6   112 
    2015-12-01    2   160 
    2016-01-01    6   124
    2016-01-01    6   572
    2016-02-01    5   160 
    2016-02-01    1   574
', header=TRUE)

newds <-
read.table(text='
          Date Code Value 
    2015-10-01    1   145
    2015-10-01    2  1452
    2015-11-01    6   125 
    2015-12-01    2   160 
    2016-01-01    6  1501
    2016-01-01    6   572
    2016-03-01    9   452
    2016-03-01    7   500
', header=TRUE)

Upvotes: 1

Related Questions