Can't create variable conditioned on NA values of other 2 variables

I need to create a variable which is similar to var1 if var2 is missing, similar to var2 if var1 is empty, the mean of var1 and var 2 if the two are not missing ((var1+var2)/2) and finally NA if both var1 and var2 are also missing.

I have data like:

    library(tidyverse)

    df <- tibble(
    var1 = c(1, 2, 3, 4, NA, NA, 3, 2),
    var2 = c(3, 5, NA, 2, 3, NA, 4, NA)
    )

The result should be:

    var1  var2   newvar
    1      3      2
    2      5      3.5
    3      NA     3 
    4      2      3
    NA     3      3
    NA     NA     NA
    3      4      3.5
    2      NA     2 

I have tried using main R recoding functions, also tried using case_when:

df <- df %>% mutate (
newvar = case_when(
var1 == NA ~ var2,
var2 == NA ~ var1,
TRUE   ~ (var1+var2)/2
)
)

Not sure whether the last line would be correct but anyway the code didn't work due to missings, it says:

Error in mutate_impl(.data, dots) : Evaluation error: NAs are not allowed in subscripted assignments.

Upvotes: 1

Views: 158

Answers (2)

Matias Andina
Matias Andina

Reputation: 4230

Almost there, just some minor edits and it's working on my end. It's usually better to use is.na(x) instead of x == NA. Also, your TRUE at the end should check what you actually want, the case where none of them are NA.

df %>% mutate (
  newvar = case_when(
    is.na(var1) ~ var2,
    is.na(var2) ~ var1,
    !is.na(var1) && !is.na(var2)   ~ (var1+var2)/2
  )
)

Produces

# A tibble: 8 x 3
   var1  var2 newvar
  <dbl> <dbl>  <dbl>
1     1     3    2  
2     2     5    3.5
3     3    NA    3  
4     4     2    3  
5    NA     3    3  
6    NA    NA   NA  
7     3     4    3.5
8     2    NA    2  

Upvotes: 1

d.b
d.b

Reputation: 32548

df %>% mutate (
    newvar = case_when(
        xor(is.na(var1), is.na(var2)) ~ pmax(var1, var2, na.rm = TRUE),
        !is.na(var1) & !is.na(var2) ~ (var1 + var2)/2,
        TRUE ~ NaN
    )
)

Upvotes: 2

Related Questions