Reputation: 95
Suppose I have a data frame with three variables as the one bellow, I want to add a fourth variable whose values are based on the values on the second and third variable, eg. if var2 = var3 then var4 = 3, if var2 = Y and var3 = NA then var4 = 1 and, if var2 = NA and var3 = Y then var4 = 2.
var1 var2 var3
m01 Y NA
m02 Y NA
m03 NA Y
m04 NA Y
m05 Y Y
m06 Y NA
m07 Y Y
I would like to get a data frame like this:
var1 var2 var3 var4
m01 Y NA 1
m02 Y NA 1
m03 NA Y 2
m04 NA Y 2
m05 Y Y 3
m06 Y NA 1
m07 Y Y 3
I am trying with ifelse
but I haven't had success.
Any ideas?
Upvotes: 1
Views: 152
Reputation: 43334
A handful of options:
df <- read.table(text = 'var1 var2 var3
m01 Y NA
m02 Y NA
m03 NA Y
m04 NA Y
m05 Y Y
m06 Y NA
m07 Y Y', head = TRUE, stringsAsFactors = FALSE)
A typical base R approach would be to apply
to iterate rowwise across the requisite columns. This is silently coercing to a matrix, which is why some avoid this approach.
apply(df[-1], 1, function(x){sum(which(x == 'Y'))})
#> [1] 1 1 2 2 3 1 3
You could translate it to dplyr with rowwise
, which does not coerce to a matrix, but is not usually the fastest possible approach:
library(dplyr)
df %>%
rowwise() %>%
mutate(var4 = sum(which(c(var2, var3) == 'Y')))
#> Source: local data frame [7 x 4]
#> Groups: <by row>
#>
#> # A tibble: 7 x 4
#> var1 var2 var3 var4
#> <chr> <chr> <chr> <int>
#> 1 m01 Y <NA> 1
#> 2 m02 Y <NA> 1
#> 3 m03 <NA> Y 2
#> 4 m04 <NA> Y 2
#> 5 m05 Y Y 3
#> 6 m06 Y <NA> 1
#> 7 m07 Y Y 3
This also will fail as-is for factors (which get converted to integers by c
), but they can be coerced beforehand or internally, or you could use is.na
instead of checking equality.
More creative base options include pasting the columns together to create a factor that can be deliberately leveled for coercion to integer:
as.integer(factor(paste0(df$var2, df$var3), levels = c('YNA', 'NAY', 'YY')))
#> [1] 1 1 2 2 3 1 3
or using do.call
to pass a list of a function and each desired variable of df
(flattened with c
) to mapply
:
do.call(mapply,
c(function(...){sum(which(!is.na(c(...))))},
df[-1],
USE.NAMES = FALSE))
#> [1] 1 1 2 2 3 1 3
If you really want the ifelse
logic, dplyr::case_when
lets you use cascading conditionals without the messy syntax:
df %>% mutate(var4 = case_when(var2 == 'Y' & var3 == 'Y' ~ 3,
var2 == 'Y' ~ 1,
var3 == 'Y' ~ 2))
#> var1 var2 var3 var4
#> 1 m01 Y <NA> 1
#> 2 m02 Y <NA> 1
#> 3 m03 <NA> Y 2
#> 4 m04 <NA> Y 2
#> 5 m05 Y Y 3
#> 6 m06 Y <NA> 1
#> 7 m07 Y Y 3
Upvotes: 1
Reputation: 11
Using ifelse:
df$var4 <- ifelse(df$var2 == df$var3, 3,
ifelse(df$var3 == "NA" & df$var2 == "y", 1,
ifelse(df$var2 == "NA" & df$var3 == "y", 2, "?")))
works if "NA" are factor values. Otherwise replace df$var3 == "NA"
with is.na(df$var3)
and df$var2 == "NA"
with is.na(df$var2)
Upvotes: 1
Reputation: 575
Try this:
library(dplyr)
df <- data.frame(var1 = paste0("m0",1:7),
var2 = c(rep("Y",2) ,rep(NA, 2), rep("Y", 3)),
var3 = c(rep(NA, 2), rep("Y", 3), NA, "Y"))
mutate(df, var4 = if_else(var2 == "Y",
if_else(var3 == "Y", 3, 1,1),
2, 2))
if_else
from dplyr
package will handle the case of missing number (NA) as well
Upvotes: 2
Reputation: 93813
Everyone forgets about poor old interaction
:
c(3,2,1,4)[interaction(lapply(dat[-1], is.na))]
#[1] 1 1 2 2 3 1 3
Upvotes: 6