chipsin
chipsin

Reputation: 675

Mutate a new variable using a value from one of two variables based on a condition

I have a large dataset and I am trying to create a new column which incorporates either Variable 1 or Variable 2 based on different scenarios. I have already converted the different scenarios into groups (1 to 4 in the Scenario column).

Where Variables 1 and 2 are populated with a value for a specific sample I need Variable 1 to take precedence over Variable 2 in the New_Var column (this is scenario 1). Where Variable 1 has a value and Variable 2 has a NA value for a specific sample I need to populate the sample in the New_Var column with the Variable 1 value (this is scenario 2). Where Variable 1 is NA, and Variable 2 is a value for a specific sample I need Variable 2 to populate the New_Var column (this is scenario 3). Where both Variables 1 and 2 are NA I need the NA result reported in the New_Var column.

I have already been able to mutate a new column using dplyr for each scenario, but unsure how to generate the New_Var column for the desired output below.

Sample <- c('ID1','ID2','ID3','ID4','ID5','ID6','ID7','ID8','ID9','ID10','ID11') 
Var1 <- c(1,1,1,1.5,2.1,4,NA,NA,NA,NA,NA)
Var2 <- c(3,5,1,NA,NA,NA,1.6,1.4,2,NA,NA)
Scenario <- c(1,1,1,2,2,2,3,3,3,4,4)
New_Var <- c(1,1,1,1.5,2.1,4,1.6,1.4,2,0,0)
df <- data.frame(Sample,Var1,Var2,Scenario,New_Var)
df

   Sample Var1 Var2 Scenario New_Var
1     ID1  1.0  3.0        1     1.0
2     ID2  1.0  5.0        1     1.0
3     ID3  1.0  1.0        1     1.0
4     ID4  1.5   NA        2     1.5
5     ID5  2.1   NA        2     2.1
6     ID6  4.0   NA        2     4.0
7     ID7   NA  1.6        3     1.6
8     ID8   NA  1.4        3     1.4
9     ID9   NA  2.0        3     2.0
10   ID10   NA   NA        4     NA
11   ID11   NA   NA        4     NA

Upvotes: 1

Views: 534

Answers (2)

akrun
akrun

Reputation: 887741

We can use fcoalesce from data.table

library(data.table)
setDT(df)[, newvar := fcoalesce(Var1, Var2)]

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389225

You can use coalesce which handles all the 4 cases as described.

library(dplyr)
df %>%  mutate(newvar = coalesce(Var1, Var2))

#   Sample Var1 Var2 Scenario newvar
#1     ID1  1.0  3.0        1    1.0
#2     ID2  1.0  5.0        1    1.0
#3     ID3  1.0  1.0        1    1.0
#4     ID4  1.5   NA        2    1.5
#5     ID5  2.1   NA        2    2.1
#6     ID6  4.0   NA        2    4.0
#7     ID7   NA  1.6        3    1.6
#8     ID8   NA  1.4        3    1.4
#9     ID9   NA  2.0        3    2.0
#10   ID10   NA   NA        4     NA
#11   ID11   NA   NA        4     NA

Or use ifelse in base R/dplyr :

transform(df, newvar = ifelse(is.na(Var1), Var2, Var1))
df %>% mutate(newvar = if_else(is.na(Var1), Var2, Var1))

Upvotes: 3

Related Questions