Reputation: 641
I am trying to create a new column that indicates differences between two existing columns. NA
s should be considered values and should be marked as "difference". However, NA
s are being "passed through" the !=
comparator. I have looked for case_when
arguments to deal with NA
s and looked for alternative not equal comparators to no avail.
The below reprex shows the current output and the desired output.
Thank you in advance for your help!
library(dplyr)
library(tidyr)
library(tibble)
df <-
expand_grid(x = c("a", NA), y = c("b", NA)) %>%
add_row(x = "a", y = "a") %>%
add_row(x = "b", y = "b")
df
#> # A tibble: 6 x 2
#> x y
#> <chr> <chr>
#> 1 a b
#> 2 a <NA>
#> 3 <NA> b
#> 4 <NA> <NA>
#> 5 a a
#> 6 b b
# Non-desired output: NA's passed through instead of treated as values
df %>%
mutate(z = case_when(
x == "a" & y == "a" ~ "a",
x == "b" & y == "b" ~ "b",
x != y ~ "difference"
))
#> # A tibble: 6 x 3
#> x y z
#> <chr> <chr> <chr>
#> 1 a b difference
#> 2 a <NA> <NA>
#> 3 <NA> b <NA>
#> 4 <NA> <NA> <NA>
#> 5 a a a
#> 6 b b b
# Desired output
df %>%
add_column(z = c(rep("difference", 3), NA_character_, "a", "b"))
#> # A tibble: 6 x 3
#> x y z
#> <chr> <chr> <chr>
#> 1 a b difference
#> 2 a <NA> difference
#> 3 <NA> b difference
#> 4 <NA> <NA> <NA>
#> 5 a a a
#> 6 b b b
Created on 2020-08-06 by the reprex package (v0.3.0)
Upvotes: 2
Views: 2924
Reputation: 641
Like @akrun mentioned, there's a workaround with is.na
with "exclusive or"/xor
. Here's what I ended up using:
df %>%
mutate(z = case_when(
x == y ~ x,
xor(is.na(x), is.na(y)) ~ "difference",
x != y ~ "difference",
is.na(x) & is.na(y) ~ NA_character_
))
Upvotes: 1
Reputation: 887038
The issue is with ==
and NA
. Any value compared to NA returns NA
. It can be corrected with is.na
also in the comparison, but then it needs to be repeated. Or else an easy fix is to change the NA
to a different value, do the comparison and bind with the original dataset
library(dplyr)
df %>%
mutate(across(x:y, replace_na, '')) %>%
transmute(z = case_when(
x == "a" & y == "a" ~ "a",
x == "b" & y == "b" ~ "b",
x != y ~ "difference"
)) %>%
bind_cols(df, .)
Upvotes: 2