prog
prog

Reputation: 1073

merge and update not present values

I am trying to merge two dataframes such that if any not columns present in df1's variables source should take from df2's rows. I tried the merge below but no works. Please help.

df1 <- data.frame(source = c("Pr1", "Pr2","Pr3"),
                     date = c("Jan 2018","Feb 2018","Mar 2018"),
                     Freq = c(100,20,30))
df2 <- data.frame(source = c("Pr1","Pr2","Pr1"),
                     date = c("Sep 2018","Aug 2018","Jan 2018"),
                     Freq = c(0,0,0))

df <- merge(df1,df2,by = c("source","date"))

Expected Output:

  source     date Freq
1    Pr1 Jan 2018  100
2    Pr2 Feb 2018   20
3    Pr3 Mar 2018   30
4    Pr1 Sep 2018    0
5    Pr2 Aug 2018    0

Upvotes: 0

Views: 45

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 102329

Maybe you can set all = TRUE when you used merge and apply aggregate for update, i.e.,

df <- aggregate(Freq~.,merge(df1,df2,all = TRUE),FUN = max)

such that

> df
  source     date Freq
1    Pr2 Feb 2018   20
2    Pr1 Jan 2018  100
3    Pr3 Mar 2018   30
4    Pr2 Aug 2018    0
5    Pr1 Sep 2018    0

Upvotes: 2

manotheshark
manotheshark

Reputation: 4357

A full join is required

df3 <- merge(df1, df2, by = c("source", "date"), all = TRUE, suffixes = c("", ".y"))
df3$Freq[is.na(df3$Freq)] <- df3$Freq.y[is.na(df3$Freq)]

  source     date Freq Freq.y
1    Pr1 Jan 2018  100      0
2    Pr1 Sep 2018    0      0
3    Pr2 Feb 2018   20     NA
4    Pr2 Aug 2018    0      0
5    Pr3 Mar 2018   30     NA

Upvotes: 2

TTS
TTS

Reputation: 1928

Here's a tidyverse/dplyr answer.

library(tidyverse)

df1 <- data.frame(source = c("Pr1", "Pr2","Pr3"),
                  date = c("Jan 2018","Feb 2018","Mar 2018"),
                  Freq = c(100,20,30))
df2 <- data.frame(source = c("Pr1","Pr2","Pr1"),
                  date = c("Sep 2018","Aug 2018","Jan 2018"),
                  Freq = c(0,0,0))

df <- full_join(df1,df2,by = c("source","date"))
  source     date Freq.x Freq.y
1    Pr1 Jan 2018    100      0
2    Pr2 Feb 2018     20     NA
3    Pr3 Mar 2018     30     NA
4    Pr1 Sep 2018     NA      0
5    Pr2 Aug 2018     NA      0
final_df <- df %>%
  mutate(Frequency = if_else(is.na(Freq.x), Freq.y, Freq.x)) %>%
  select(source, date, Frequency)
  source     date Frequency
1    Pr1 Jan 2018       100
2    Pr2 Feb 2018        20
3    Pr3 Mar 2018        30
4    Pr1 Sep 2018         0
5    Pr2 Aug 2018         0

Just note the behavior of the if_else() - it will keep the df1 frequency if it is 0 (not NA).

Upvotes: 1

Related Questions