Reputation: 35
I want to merge the df OldData and NewData. In this case, Nov-2015 and Dec 2015 are present in both df. Since NewData is the most accurate update available, I want to update the value of Nov-2015 and Dec 2015 using the value in df NewData and of course adding the records of Jan-2016 and Feb-2016 as well.
Can anyone help?
OldData
Month Value
1 Jan-2015 3
2 Feb-2015 76
3 Mar-2015 31
4 Apr-2015 45
5 May-2015 99
6 Jun-2015 95
7 Jul-2015 18
8 Aug-2015 97
9 Sep-2015 61
10 Oct-2015 7
11 Nov-2015 42
12 Dec-2015 32
NewData
Month Value
1 Nov-2015 88
2 Dec-2015 45
3 Jan-2016 32
4 Feb-2016 11
Here is the output I want:
JoinData
Month Value
1 Jan-2015 3
2 Feb-2015 76
3 Mar-2015 31
4 Apr-2015 45
5 May-2015 99
6 Jun-2015 95
7 Jul-2015 18
8 Aug-2015 97
9 Sep-2015 61
10 Oct-2015 7
11 Nov-2015 88
12 Dec-2015 45
13 Jan-2016 32
14 Feb-2016 11
Thanks for @akrun, the problem is solved, and the following code works smoothly!!
rbindlist(list(OldData, NewData))[!duplicated(Month, fromLast=TRUE)]
Update: Now, let's upgrade our problem little bit. suppose our OldData and NewData have another column called "Type". How do we merge/update it this time?
> OldData
Month Type Value
1 2015-01 A 3
2 2015-02 A 76
3 2015-03 A 31
4 2015-04 A 45
5 2015-05 A 99
6 2015-06 A 95
7 2015-07 A 18
8 2015-08 A 97
9 2015-09 A 61
10 2015-10 A 7
11 2015-11 B 42
12 2015-12 C 32
13 2015-12 D 77
> NewData
Month Type Value
1 2015-11 A 88
2 2015-12 C 45
3 2015-12 D 22
4 2016-01 A 32
5 2016-02 A 11
The JoinData will suppose to update all value from NewData ass following:
> JoinData
Month Type Value
1 2015-01 A 3
2 2015-02 A 76
3 2015-03 A 31
4 2015-04 A 45
5 2015-05 A 99
6 2015-06 A 95
7 2015-07 A 18
8 2015-08 A 97
9 2015-09 A 61
10 2015-10 A 7
11 2015-11 B 42
12 2015-11 A 88 (originally not included, added from the NewData)
12 2015-12 C 45 (Updated the value by NewData)
13 2015-12 D 22 (Updated the value by NewData)
14 2016-01 A 32 (newly added from NewData)
15 2016-02 A 11 (newly added from NewData)
Thanks for @akrun: I have got the solution here for the second question as well. Thanks for the help for everyone here! Here is the answer:
d1 <- merge(OldData, NewData, by = c("Month","Type"), all = TRUE);d2 <- transform(d1, Value.x= ifelse(!is.na(Value.y), Value.y, Value.x))[-4];d2[!duplicated(d2[1:2], fromLast=TRUE),]
Upvotes: 1
Views: 1176
Reputation: 887058
Here is an option using data.table
(similar approach as @thelatemail mentioned in the comments)
library(data.table)
rbindlist(list(OldData, NewData))[!duplicated(Month, fromLast=TRUE)]
Or
rbindlist(list(OldData, NewData))[,if(.N >1) .SD[.N] else .SD, Month]
Upvotes: 1