atsang01
atsang01

Reputation: 35

How to update and replace part of old data

I want to merge the df OldData and NewData. In this case, Nov-2015 and Dec 2015 are present in both df. Since NewData is the most accurate update available, I want to update the value of Nov-2015 and Dec 2015 using the value in df NewData and of course adding the records of Jan-2016 and Feb-2016 as well.

Can anyone help?

OldData
      Month Value
1  Jan-2015     3
2  Feb-2015    76
3  Mar-2015    31
4  Apr-2015    45
5  May-2015    99
6  Jun-2015    95
7  Jul-2015    18
8  Aug-2015    97
9  Sep-2015    61
10 Oct-2015     7
11 Nov-2015    42
12 Dec-2015    32

NewData
      Month Value
1  Nov-2015    88
2  Dec-2015    45
3  Jan-2016    32
4  Feb-2016    11

Here is the output I want:

JoinData
      Month Value
1  Jan-2015     3
2  Feb-2015    76
3  Mar-2015    31
4  Apr-2015    45
5  May-2015    99
6  Jun-2015    95
7  Jul-2015    18
8  Aug-2015    97
9  Sep-2015    61
10 Oct-2015     7
11 Nov-2015    88
12 Dec-2015    45
13 Jan-2016    32
14 Feb-2016    11

Thanks for @akrun, the problem is solved, and the following code works smoothly!!

rbindlist(list(OldData, NewData))[!duplicated(Month, fromLast=TRUE)]

Update: Now, let's upgrade our problem little bit. suppose our OldData and NewData have another column called "Type". How do we merge/update it this time?

> OldData Month Type Value 1 2015-01 A 3 2 2015-02 A 76 3 2015-03 A 31 4 2015-04 A 45 5 2015-05 A 99 6 2015-06 A 95 7 2015-07 A 18 8 2015-08 A 97 9 2015-09 A 61 10 2015-10 A 7 11 2015-11 B 42 12 2015-12 C 32 13 2015-12 D 77

> NewData Month Type Value 1 2015-11 A 88 2 2015-12 C 45 3 2015-12 D 22 4 2016-01 A 32 5 2016-02 A 11

The JoinData will suppose to update all value from NewData ass following:

> JoinData Month Type Value 1 2015-01 A 3 2 2015-02 A 76 3 2015-03 A 31 4 2015-04 A 45 5 2015-05 A 99 6 2015-06 A 95 7 2015-07 A 18 8 2015-08 A 97 9 2015-09 A 61 10 2015-10 A 7 11 2015-11 B 42 12 2015-11 A 88 (originally not included, added from the NewData) 12 2015-12 C 45 (Updated the value by NewData) 13 2015-12 D 22 (Updated the value by NewData) 14 2016-01 A 32 (newly added from NewData) 15 2016-02 A 11 (newly added from NewData)

Thanks for @akrun: I have got the solution here for the second question as well. Thanks for the help for everyone here! Here is the answer:

d1 <- merge(OldData, NewData, by = c("Month","Type"), all = TRUE);d2 <- transform(d1, Value.x= ifelse(!is.na(Value.y), Value.y, Value.x))[-4];d2[!duplicated(d2[1:2], fromLast=TRUE),]

Upvotes: 1

Views: 1176

Answers (1)

akrun
akrun

Reputation: 887058

Here is an option using data.table (similar approach as @thelatemail mentioned in the comments)

library(data.table)
rbindlist(list(OldData, NewData))[!duplicated(Month, fromLast=TRUE)]

Or

rbindlist(list(OldData, NewData))[,if(.N >1) .SD[.N] else .SD, Month] 

Upvotes: 1

Related Questions