Reputation: 1073
I am trying to merge two dataframes such that if any not columns present in df1's variables source should take from df2's rows. I tried the merge below but no works. Please help.
df1 <- data.frame(source = c("Pr1", "Pr2","Pr3"),
date = c("Jan 2018","Feb 2018","Mar 2018"),
Freq = c(100,20,30))
df2 <- data.frame(source = c("Pr1","Pr2","Pr1"),
date = c("Sep 2018","Aug 2018","Jan 2018"),
Freq = c(0,0,0))
df <- merge(df1,df2,by = c("source","date"))
Expected Output:
source date Freq
1 Pr1 Jan 2018 100
2 Pr2 Feb 2018 20
3 Pr3 Mar 2018 30
4 Pr1 Sep 2018 0
5 Pr2 Aug 2018 0
Upvotes: 0
Views: 45
Reputation: 102329
Maybe you can set all = TRUE
when you used merge
and apply aggregate
for update, i.e.,
df <- aggregate(Freq~.,merge(df1,df2,all = TRUE),FUN = max)
such that
> df
source date Freq
1 Pr2 Feb 2018 20
2 Pr1 Jan 2018 100
3 Pr3 Mar 2018 30
4 Pr2 Aug 2018 0
5 Pr1 Sep 2018 0
Upvotes: 2
Reputation: 4357
A full join is required
df3 <- merge(df1, df2, by = c("source", "date"), all = TRUE, suffixes = c("", ".y"))
df3$Freq[is.na(df3$Freq)] <- df3$Freq.y[is.na(df3$Freq)]
source date Freq Freq.y
1 Pr1 Jan 2018 100 0
2 Pr1 Sep 2018 0 0
3 Pr2 Feb 2018 20 NA
4 Pr2 Aug 2018 0 0
5 Pr3 Mar 2018 30 NA
Upvotes: 2
Reputation: 1928
Here's a tidyverse/dplyr answer.
library(tidyverse)
df1 <- data.frame(source = c("Pr1", "Pr2","Pr3"),
date = c("Jan 2018","Feb 2018","Mar 2018"),
Freq = c(100,20,30))
df2 <- data.frame(source = c("Pr1","Pr2","Pr1"),
date = c("Sep 2018","Aug 2018","Jan 2018"),
Freq = c(0,0,0))
df <- full_join(df1,df2,by = c("source","date"))
source date Freq.x Freq.y
1 Pr1 Jan 2018 100 0
2 Pr2 Feb 2018 20 NA
3 Pr3 Mar 2018 30 NA
4 Pr1 Sep 2018 NA 0
5 Pr2 Aug 2018 NA 0
final_df <- df %>%
mutate(Frequency = if_else(is.na(Freq.x), Freq.y, Freq.x)) %>%
select(source, date, Frequency)
source date Frequency
1 Pr1 Jan 2018 100
2 Pr2 Feb 2018 20
3 Pr3 Mar 2018 30
4 Pr1 Sep 2018 0
5 Pr2 Aug 2018 0
Just note the behavior of the if_else() - it will keep the df1 frequency if it is 0 (not NA).
Upvotes: 1