tjebo
tjebo

Reputation: 23757

Value matching with NA - missing values - using mutate

I am somewhat stuck. Is there a better way than the below to do value matching considering NAs as "real values" within mutate?

library(dplyr)

data_foo <- data.frame(A= c(1:2, NA, 4, NA), B = c(1, 3, NA, NA, 4))

Not the desired output:

data_foo %>% mutate(irr = A==B)

#>    A  B   irr
#> 1  1  1  TRUE
#> 2  2  3 FALSE
#> 3 NA NA    NA
#> 4  4 NA    NA
#> 5 NA  4    NA

data_foo %>% rowwise() %>% mutate(irr = A%in%B)

#> Source: local data frame [5 x 3]
#> Groups: <by row>
#> 
#> # A tibble: 5 x 3
#>       A     B irr  
#>   <dbl> <dbl> <lgl>
#> 1     1     1 TRUE 
#> 2     2     3 FALSE
#> 3    NA    NA FALSE
#> 4     4    NA FALSE
#> 5    NA     4 FALSE

Desired output: The below shows the desired column, irr. I am using this somewhat cumbersome helper columns. Is there a shorter way?

data_foo %>% 
  mutate(NA_A = is.na(A), 
         NA_B = is.na(B), 
         irr = if_else(is.na(A)|is.na(B), NA_A == NA_B, A == B))

#>    A  B  NA_A  NA_B   irr
#> 1  1  1 FALSE FALSE  TRUE
#> 2  2  3 FALSE FALSE FALSE
#> 3 NA NA  TRUE  TRUE  TRUE
#> 4  4 NA FALSE  TRUE FALSE
#> 5 NA  4  TRUE FALSE FALSE

Upvotes: 6

Views: 635

Answers (4)

Rui Barradas
Rui Barradas

Reputation: 76470

Maybe simpler than akrun's answer?
Any of the two ways below will produce the expected result. Note that as.character won't do it, because the return value of as.character(NA) is NA_character_.

data_foo %>%
  mutate(irr = paste(A) == paste(B))

data_foo %>%
  mutate(irr = sQuote(A) == sQuote(B))

#Source: local data frame [5 x 3]
#Groups: <by row>
#
## A tibble: 5 x 3
#      A     B irr  
#  <dbl> <dbl> <lgl>
#1     1     1 TRUE 
#2     2     3 FALSE
#3    NA    NA TRUE 
#4     4    NA FALSE
#5    NA     4 FALSE

Edit.

  1. Following the comments below I have updated the code and it now follows akrun's suggestion.
  2. There is also the excellent idea in tmfmnk's answer. I use a similar one in yet another way of solving the question's problem.

The documentation of all.equal says that

Do not use all.equal directly in if expressions—either use isTRUE(all.equal(....)) or identical if appropriate.

Though there is no if expression in mutate, I believe that it is more stable than identical and has the same effect if the values being compared are (sort of/in fact) equal.

data_foo %>%
  mutate(irr = isTRUE(all.equal(A, B)))

Upvotes: 5

IceCreamToucan
IceCreamToucan

Reputation: 28695

The coalesce function is useful if you want to perform an action when a value is NA

data_foo %>% 
  mutate(irr = coalesce(A == B, is.na(A) & is.na(B)))

#    A  B   irr
# 1  1  1  TRUE
# 2  2  3 FALSE
# 3 NA NA  TRUE
# 4  4 NA FALSE
# 5 NA  4 FALSE

Same thing for > 2 columns

data_foo %>% 
  mutate(irr = coalesce(reduce(., `==`), rowMeans(is.na(.)) == 1))

Upvotes: 2

akrun
akrun

Reputation: 887301

Using map2

library(tidyverse)
data_foo %>%
   mutate(irr = map2_lgl(A, B, `%in%`))
#   A  B   irr
#1  1  1  TRUE
#2  2  3 FALSE
#3 NA NA  TRUE
#4  4 NA FALSE
#5 NA  4 FALSE

Or with setequal

data_foo %>% 
   rowwise %>%
   mutate(irr = setequal(A, B))

The above method is concise, but it is also loopy. We can replace the NA with a different value and then do the ==

data_foo %>%
     mutate_all(list(new = ~ replace_na(., -999))) %>%
     transmute(A, B, irr = A_new == B_new)
#   A  B   irr
#1  1  1  TRUE
#2  2  3 FALSE
#3 NA NA  TRUE
#4  4 NA FALSE
#5 NA  4 FALSE

Or with bind_cols and reduce

data_foo %>%
    mutate_all(replace_na, -999) %>% 
    reduce(`==`) %>% 
    bind_cols(data_foo, irr = .)

Upvotes: 6

tmfmnk
tmfmnk

Reputation: 39868

Could also be a possibility:

data_foo %>%
 rowwise() %>%
 mutate(irr = identical(A, B)) %>%
 ungroup()

      A     B irr  
  <dbl> <dbl> <lgl>
1     1     1 TRUE 
2     2     3 FALSE
3    NA    NA TRUE 
4     4    NA FALSE
5    NA     4 FALSE

Upvotes: 2

Related Questions