mcl76
mcl76

Reputation: 1

How to merge two data frames based on unique ID's and changing/updating a column

I have just been given 2 sales data frames; the main data frame houses all the information while the second has the latest sales for the month.

I need to merge the information from one data frame into the other based on a client's unique ID. Specifically, I would like to update information in the main data frame based on the new df.

I have tried using the merge function in R but have errors in replacing the information. I have done this merge in excel (but it takes a while since I have a lot of information).

The data frames looks a little like this...

MainDF
UID       Status      Gender     SaleType
136273    inactive      M        Repair
182732    inactive      F        Product
298372    quit          F        Repair
283715    inactive      M        Product

NewDFtoAdd
UID          Gender     
136273         M
293827         F
283715         F
167643         M

I want to take the existing IDs in the main df and rewrite their status as 'active' if they appear in the NewDFtoAdd.

MainDF
UID       Status      Gender     SaleType
136273    **active**        M        Repair
182732    inactive      F        Product
298372    quit          F        Repair
283715    **active**        F        Product

I really want to code this in R; so as new data is given I can just update accordingly and my analysis code can continue.

Upvotes: 0

Views: 1046

Answers (2)

Kay
Kay

Reputation: 2332

If you love dplyr and want to go that route, you can do this:

library(dplyr)

MainDF%>%
mutate_at(vars(Status), 
        list(~case_when(
          UID %in% NewDFtoAdd$UID ~ "active",
          TRUE ~ Status
        )))

#     UID   Status Gender SaleType
#1 136273   active      M   Repair
#2 182732 inactive      F  Product
#3 298372     quit      F   Repair
#4 283715   active      M  Product

Upvotes: 0

akrun
akrun

Reputation: 886938

We can use a data.table join to replace the 'Status' to 'active' by joining on the 'UID'

library(data.table)
setDT(MainDF)[NewDFtoAdd, Status := "active", on = .(UID)]
MainDF
#      UID   Status Gender SaleType
#1: 136273   active      M   Repair
#2: 182732 inactive      F  Product
#3: 298372     quit      F   Repair
#4: 283715   active      M  Product

Or using base R

i1 <- MainDF$UID %in% NewDFtoAdd$UID
MainDF$UID[i1] <- "active"

NOTE: This gives the expected output.

data

MainDF <- structure(list(UID = c(136273L, 182732L, 298372L, 283715L), Status = c("inactive", 
"inactive", "quit", "inactive"), Gender = c("M", "F", "F", "M"
), SaleType = c("Repair", "Product", "Repair", "Product")), class = "data.frame", row.names = c(NA, 
-4L))

NewDFtoAdd <- structure(list(UID = c(136273L, 293827L, 283715L, 167643L), Gender = c("M", 
"F", "F", "M")), class = "data.frame", row.names = c(NA, -4L))

Upvotes: 1

Related Questions