Reputation: 675
I have a large dataset and I am trying to create a new column which incorporates either Variable 1 or Variable 2 based on different scenarios. I have already converted the different scenarios into groups (1 to 4 in the Scenario column).
Where Variables 1 and 2 are populated with a value for a specific sample I need Variable 1 to take precedence over Variable 2 in the New_Var column (this is scenario 1). Where Variable 1 has a value and Variable 2 has a NA value for a specific sample I need to populate the sample in the New_Var column with the Variable 1 value (this is scenario 2). Where Variable 1 is NA, and Variable 2 is a value for a specific sample I need Variable 2 to populate the New_Var column (this is scenario 3). Where both Variables 1 and 2 are NA I need the NA result reported in the New_Var column.
I have already been able to mutate a new column using dplyr for each scenario, but unsure how to generate the New_Var column for the desired output below.
Sample <- c('ID1','ID2','ID3','ID4','ID5','ID6','ID7','ID8','ID9','ID10','ID11')
Var1 <- c(1,1,1,1.5,2.1,4,NA,NA,NA,NA,NA)
Var2 <- c(3,5,1,NA,NA,NA,1.6,1.4,2,NA,NA)
Scenario <- c(1,1,1,2,2,2,3,3,3,4,4)
New_Var <- c(1,1,1,1.5,2.1,4,1.6,1.4,2,0,0)
df <- data.frame(Sample,Var1,Var2,Scenario,New_Var)
df
Sample Var1 Var2 Scenario New_Var
1 ID1 1.0 3.0 1 1.0
2 ID2 1.0 5.0 1 1.0
3 ID3 1.0 1.0 1 1.0
4 ID4 1.5 NA 2 1.5
5 ID5 2.1 NA 2 2.1
6 ID6 4.0 NA 2 4.0
7 ID7 NA 1.6 3 1.6
8 ID8 NA 1.4 3 1.4
9 ID9 NA 2.0 3 2.0
10 ID10 NA NA 4 NA
11 ID11 NA NA 4 NA
Upvotes: 1
Views: 534
Reputation: 887741
We can use fcoalesce
from data.table
library(data.table)
setDT(df)[, newvar := fcoalesce(Var1, Var2)]
Upvotes: 1
Reputation: 389225
You can use coalesce
which handles all the 4 cases as described.
library(dplyr)
df %>% mutate(newvar = coalesce(Var1, Var2))
# Sample Var1 Var2 Scenario newvar
#1 ID1 1.0 3.0 1 1.0
#2 ID2 1.0 5.0 1 1.0
#3 ID3 1.0 1.0 1 1.0
#4 ID4 1.5 NA 2 1.5
#5 ID5 2.1 NA 2 2.1
#6 ID6 4.0 NA 2 4.0
#7 ID7 NA 1.6 3 1.6
#8 ID8 NA 1.4 3 1.4
#9 ID9 NA 2.0 3 2.0
#10 ID10 NA NA 4 NA
#11 ID11 NA NA 4 NA
Or use ifelse
in base R/dplyr
:
transform(df, newvar = ifelse(is.na(Var1), Var2, Var1))
df %>% mutate(newvar = if_else(is.na(Var1), Var2, Var1))
Upvotes: 3