Gabriella
Gabriella

Reputation: 312

Conditionally replace cells in data frame based on another data frame

In the interest of learning better coding practices, can anyone show me a more efficient way of solving my problem? Maybe one that doesn't require new columns...

Problem: I have two data frames: one is my main data table (t) and the other contains changes I need to replace in the main table (Manual_changes). Example: Sometimes the CaseID is matched with the wrong EmployeeID in the file.

I can't provide the main data table, but the Manual_changes file looks like this:

Manual_changes = structure(list(`Case ID` = c(46605, 25321, 61790, 43047, 12157, 
16173, 94764, 38700, 41798, 56198, 79467, 61907, 89057, 34232, 
100189), `Employee ID` = c(NA, NA, NA, NA, NA, NA, NA, NA, 906572, 
164978, 145724, 874472, 654830, 846333, 256403), `Age in Days` = c(3, 
3, 3, 12, 0, 0, 5, 0, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, 
-15L), class = c("tbl_df", "tbl", "data.frame"))

temp = merge(t, Manual_changes, by = "Case ID", all.x = TRUE)
temp$`Employee ID.y` = ifelse(is.na(temp$`Employee ID.y`), temp$`Employee ID.x`, temp$`Employee ID.y`)
temp$`Age in Days.y`= ifelse(is.na(temp$`Age in Days.y`), temp$`Age in Days.x`, temp$`Age in Days.y`)
temp$`Age in Days.x` = NULL
temp$`Employee ID.x` = NULL
colnames(temp) = colnames(t)
t = temp

Upvotes: 1

Views: 34

Answers (1)

akrun
akrun

Reputation: 886948

We could use coalesce

library(dplyr)
left_join(t, Manual_changes, by = "Case ID") %>%
          mutate(Employee_ID.y = coalesce(`Employee ID.x`, `Employee ID.y`),
           `Age in Days.y` = coalesce(`Age in Days.x`, `Age in Days.y`))

Or with data.table

library(data.table)
setDT(t)[Manual_changes, 
   c('Employee ID', 'Age in Days') :=  
            .(fcoalesce(`Employee ID.x`, `Employee ID.y`),
                  fcoalesce(`Age in Days.x`, `Age in Days.y`)),

                on = .(`Case ID`)]

Upvotes: 1

Related Questions