Reputation: 113
I have two large data sets. One is an old one and the second one is about the same as the first one. The difference is that the second one has new rows, updated Value
and missing rows compared to the first data set.
I would like to keep everything in the new data set and fill it with the missing rows (combinations of Date
and Code
) that are present in the old data set. The order is not important.
Old data set:
Date Code Value
2015-10-01 1 145
2015-10-01 1 175
2015-11-01 6 112
2015-12-01 2 160
2016-01-01 6 124
2016-01-01 6 572
2016-02-01 5 160
2016-02-01 1 574
New data set:
Date Code Value
2015-10-01 1 145
2015-10-01 2 1452
2015-11-01 6 125
2015-12-01 2 160
2016-01-01 6 1501
2016-01-01 6 572
2016-03-01 9 452
2016-03-01 7 500
Output:
Date Code Value
2015-10-01 1 145
2015-10-01 2 1452
2015-11-01 6 125
2015-12-01 2 160
2016-01-01 6 1501
2016-01-01 6 572
2016-03-01 9 452
2016-03-01 7 500
2015-10-01 1 175
2016-02-01 5 160
2016-02-01 1 574
When there is no matching combination of Date
and Code
the corresponding row from the old data set should be added.
In the output, the last three rows come from the old data set. I have looked at different posts without luck to find what I need.
Upvotes: 2
Views: 62
Reputation: 1781
You can use the anti_join function from the dplyr library to find all the rows in the old_df that doesn't exist in new_df
df <- anti_join(old_df,new_df,by=c("date","code","value"))
date code value
1 2016-01-01 6 124
2 2016-02-01 1 574
3 2016-02-01 5 160
4 2015-10-01 1 175
5 2015-11-01 6 112
final_df <- full_join(df,new_df,by=c("date","code","value"))
date code value
1 2016-01-01 6 124
2 2016-02-01 1 574
3 2016-02-01 5 160
4 2015-10-01 1 175
5 2015-11-01 6 112
6 2015-10-01 1 145
7 2015-10-01 2 1452
8 2015-11-01 6 125
9 2015-12-01 2 160
10 2016-01-01 6 1501
11 2016-01-01 6 572
12 2016-03-01 9 452
13 2016-03-01 7 500
Upvotes: 0
Reputation: 7839
You can do this
key.new <- with(newds, interaction(Date, Code, Value))
key.old <- with(oldds, interaction(Date, Code, Value))
then
ind <- match(key.old, key.new, nomatch=0) == 0
gives you the rows from the old data set that should be added to the new one, if I understand the question correctly.
> oldds[ind, ]
Date Code Value
2 2015-10-01 1 175
3 2015-11-01 6 112
5 2016-01-01 6 124
7 2016-02-01 5 160
8 2016-02-01 1 574
As an aside note, it's recommended that you provide a minimal reproducible
example, including the commands to recreate the data that you're working
with e.g. the output of dput(oldds)
and dput(newds)
or
oldds <-
read.table(text='
Date Code Value
2015-10-01 1 145
2015-10-01 1 175
2015-11-01 6 112
2015-12-01 2 160
2016-01-01 6 124
2016-01-01 6 572
2016-02-01 5 160
2016-02-01 1 574
', header=TRUE)
newds <-
read.table(text='
Date Code Value
2015-10-01 1 145
2015-10-01 2 1452
2015-11-01 6 125
2015-12-01 2 160
2016-01-01 6 1501
2016-01-01 6 572
2016-03-01 9 452
2016-03-01 7 500
', header=TRUE)
Upvotes: 1