Reputation: 115
I am performing some data cleaning. I am trying to do conditional replacement using dplyr. However I have not been successful at replacing one column by the value of another one according to a given condition.
The idea is simple, I want that if X3 is different from X4 and X3 different from NA
(missing value) then a new variable X5 is generated with value equals to X3, if X3 equals X4 then X5 equals X2, if X3 if different from X4, and X2 equals X4 I want X5 equals X1, and if X3 and X2 are different from X4 then X5 equals X1.
Anyone can help here?
structure(list(X1 = structure(c(2L, 3L, 1L, 1L), .Label = c("683 513",
"8", "ABA"), class = "factor"), X2 = structure(c(2L, 1L, 3L,
NA), .Label = c("10", "983 035", "A"), class = "factor"), X3 = structure(c(2L,
1L, NA, NA), .Label = c("963 654", "A - J"), class = "factor"),
X4 = structure(c(3L, 2L, 1L, 4L), .Label = c("A", "A - B",
"A - J", "K - B"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 0
Views: 42
Reputation: 245
This uses the mutate()
and case_when()
functions from dplyr.
test_data %>%
mutate(across(everything(), as.character)) %>%
mutate(X5 = case_when(X3 != X4 & is.na(X3) ~ X3,
X3 == X4 ~ X2,
X2 == X4 & X3 != X4 ~ X1,
X3 != X4 & X3 != X4 ~ X1))
Upvotes: 2
Reputation: 389135
You can write the series of conditions in case_when
with conditions which has higher priority earlier.
library(dplyr)
df %>%
mutate(across(.fns = as.character),
X5 = case_when(X3 != X4 & X2 == X4 ~ X1,
X3 != X4 & X2 != X4 ~ X1,
X3 != X4 ~ X3,
X3 == X4 ~ X2
))
Upvotes: 2