Alexander
Alexander

Reputation: 4645

if statement with multiple columns and conditions

With the following sample data I'm trying to create a new variable "category" based on the values of three columns variables (type, addict, and sex).

But I would like to combine type and addict into one group and sex in another group. So I use any to get logically to a set of logical vectors, is at least one of the values true or both of them true.

df <- data.frame(type = c(NA, "bad",NA), addict=c('visky','wine',NA),
                 sex=c(NA,'male',NA))


> df
  type addict  sex
1 <NA>  visky <NA>
2  bad   wine male
3 <NA>   <NA> <NA>

library(dplyr)


df%>%
  mutate(category=ifelse(any(is.na(type)&addict=="visky")&any(is.na(sex)),"categ1",
         ifelse(any(type=="bad"|addict=="wine")&any(!is.na(sex)),"categ2",
         ifelse(any(is.na(type)&is.na(addict))&any(is.na(sex)),"categ3",NA))))

            
       type addict  sex category
1 <NA>  visky <NA>   categ1
2  bad   wine male   categ1
3 <NA>   <NA> <NA>   categ1

as it can be seen my ifelse loop is not working correctly. I cannot figured out why?

the expected output

       type addict  sex category
1 <NA>  visky <NA>   categ1
2  bad   wine male   categ2
3 <NA>   <NA> <NA>   categ3

Thx in advance

Update for user defined function category

One more thing If I wanted to write user defined function without using case_when to do the same operation I would probably use

categ <- function(type,addict,sex){ 

if (any(is.na(type)&addict=="visky"&is.na(sex))){ 
"categ1" 
} 
else{ 
NA 
} 

}

but this is also gives

df%>%
mutate(category=categ(type,addict,sex))

  type addict  sex category
1 <NA>  visky <NA>   categ1
2  bad   wine male   categ1
3 <NA>   <NA> <NA>   categ1

Upvotes: 2

Views: 3053

Answers (1)

akrun
akrun

Reputation: 887891

In the OP's input dataset, all the columns were factor and along with that NAs were string "NA". Also, the OP's code is checking the entire column with any which returns a single TRUE/FALSE and gets recycled which is not the intended output. If we change those to character class and to NAs (using case_when)

df %>% 
  mutate(category = case_when(
            is.na(type) & addict %in% "visky" & is.na(sex) ~ "categ1",
            type %in% c("bad", "wine") & !is.na(sex) ~ "categ2", 
            is.na(type) & is.na(addict) & is.na(sex) ~ "categ3", 
            TRUE ~ NA_character_))
#   type addict  sex category
#1 <NA>  visky <NA>   categ1
#2  bad   wine male   categ2
#3 <NA>   <NA> <NA>   categ3

NOTE: Here, we are used %in% instead of == as == returns NA for NA elements while %in% returns FALSE. But, we could still use == with a combination of is.na


Based on the OP's comments, we could create a custom function (different function)

categFn <- function(typeCol, addictCol, sexCol) {

           if(any(is.na(typeCol) & addictCol== "visky") & any(is.na(sexCol))) {
               "categ1"
              } else NA
            }

df %>% 
     mutate(categ = categFn(type, addict, sex))

data

df <- data.frame(type = c(NA, "bad",NA), addict=c('visky','wine',NA),
                  sex=c(NA,'male',NA), stringsAsFactors = FALSE)

Upvotes: 1

Related Questions