chipsin
chipsin

Reputation: 675

Coalescing multiple columns using multiple conditions

I am trying to coalesce multiple columns using multiple conditional statements. In the example below I want to coalesce with A taking precedence over B followed by C where A is >= 0.1 and < 30, while where A is NA, < 0.1 or > 30 I was to coalesce with B taking precedence over A followed by C.

Below is an example dataset:

df <- data.frame(1:8)
df$A <- c(102, 0.04, 0.1, NA_real_, 0.01, 0.01, 0.2, NA_real_)
df$B <- c(20.2, 50.1, 10.1, 6.1, 7.1, NA_real_, 8.1, NA_real_)
df$C <- c(NA_real_, 4.1, NA_real_, NA_real_, NA_real_, 8.1, NA_real_, 10.1)


      A    B    C
1   102 20.2   NA
2  0.04 50.1  4.1
3   0.1 10.1   NA
4    NA  6.1   NA
5  0.01  7.1   NA
6  0.01   NA  8.1
7   0.2  8.1   NA
8    NA   NA 10.1

Below is the desired output:

      A    B    C new_col
1   102 20.2   NA    20.2
2  0.04 50.1  4.1    50.1
3   0.1 10.1   NA     0.1
4    NA  6.1   NA     6.1
5  0.01  7.1   NA     7.1
6  0.01   NA  8.1     8.1
7   0.2  8.1   NA     0.2
8    NA   NA 10.1    10.1

I tried to solve this using mutate and the coalesce functions in the code below but did not get the desired output (in many cases this works OK as long as the value in column A is a value where the value is NA, Nulls are produced in the output).

df <- df %>% 
  mutate(new_col = if_else(A >= 0.1 & A <= 30, 
                           coalesce(A, B, C),
                           coalesce(B, A, C)))

      A    B    C new_col
1   102 20.2   NA    20.2
2  0.04 50.1  4.1    50.1
3   0.1 10.1   NA     0.1
4    NA  6.1   NA    NULL
5  0.01  7.1   NA     7.1
6  0.01   NA  8.1    0.01
7   0.2  8.1   NA     0.2
8    NA   NA 10.1    NULL

Upvotes: 5

Views: 433

Answers (2)

akrun
akrun

Reputation: 887048

We can use case_when

library(dplyr)
df %>%
    mutate(new_col = case_when(A >= 0.1 & A <= 30 & !is.na(A) ~ 
                   coalesce(A, B, C), TRUE ~ coalesce(B, A, C)))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388907

Your attempt is correct but you need to handle NA's since NA in if_else returns NA.

library(dplyr)
df %>% 
  mutate(new_col = if_else(A >= 0.1 & A <= 30 & !is.na(A), 
                           coalesce(A, B, C),
                           coalesce(B, A, C)))


#       A    B    C new_col
#1 102.00 20.2   NA   20.20
#2   0.04 50.1  4.1   50.10
#3   0.10 10.1   NA    0.10
#4     NA  6.1   NA    6.10
#5   0.01  7.1   NA    7.10
#6   0.01   NA  8.1    0.01
#7   0.20  8.1   NA    0.20
#8     NA   NA 10.1   10.10

data

df <- data.frame(A = c(102, 0.04, 0.1, NA_real_, 0.01, 0.01, 0.2, NA_real_),
         B =  c(20.2, 50.1, 10.1, 6.1, 7.1, NA_real_, 8.1, NA_real_),
         C = c(NA_real_, 4.1, NA_real_, NA_real_, NA_real_, 8.1, NA_real_, 10.1))

Upvotes: 1

Related Questions