Replace with conditions

Question

I am performing some data cleaning. I am trying to do conditional replacement using dplyr. However I have not been successful at replacing one column by the value of another one according to a given condition.

The idea is simple, I want that if X3 is different from X4 and X3 different from NA(missing value) then a new variable X5 is generated with value equals to X3, if X3 equals X4 then X5 equals X2, if X3 if different from X4, and X2 equals X4 I want X5 equals X1, and if X3 and X2 are different from X4 then X5 equals X1.

Anyone can help here?

structure(list(X1 = structure(c(2L, 3L, 1L, 1L), .Label = c("683 513", 
"8", "ABA"), class = "factor"), X2 = structure(c(2L, 1L, 3L, 
NA), .Label = c("10", "983 035", "A"), class = "factor"), X3 = structure(c(2L, 
1L, NA, NA), .Label = c("963 654", "A - J"), class = "factor"), 
    X4 = structure(c(3L, 2L, 1L, 4L), .Label = c("A", "A - B", 
    "A - J", "K - B"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L))

Tob · Accepted Answer

This uses the mutate() and case_when() functions from dplyr.

test_data %>% 
  mutate(across(everything(), as.character)) %>%
  mutate(X5 = case_when(X3 != X4 & is.na(X3) ~ X3, 
                        X3 == X4 ~ X2, 
                        X2 == X4 & X3 != X4 ~ X1, 
                        X3 != X4 & X3 != X4 ~ X1))

Replace with conditions

Answers (2)

Related Questions