how to deal with missing value in if else statement?

Question

I have a dataframe, mydata, constructed as follows:

col1<-c(8.20e+07, 1.75e+08, NA, 4.80e+07, 
       3.40e+07, NA, 5.60e+07, 3.00e+06 )
col2<-c(1960,1960,1965,1986,1960
        ,1969,1960,1993)
col3<-c ( NA,2.190,NA,NA, 5.000, NA,
          1.700,4.220)
mydata<-data.frame(col1,col2,col3)

mydata

#       col1 col2 col3
# 1 8.20e+07 1960   NA
# 2 1.75e+08 1960 2.19
# 3       NA 1965   NA
# 4 4.80e+07 1986   NA
# 5 3.40e+07 1960 5.00
# 6       NA 1969   NA
# 7 5.60e+07 1960 1.70
# 8 3.00e+06 1993 4.22

I want to create a col4 that has the values "a", "b" and "c", if col1 is smaller than 4.00e+07, then col4=="a"; if col1 is not less than 4.00e+07, then col4=="b", else col4=="c"

Here is my code:

col4 <-ifelse(col1<4.00e+07, "a",                  
       ifelse(col1 >=4.00e+07, "b",                         
       ifelse(is.na(col1 =4.00e+07), "b",  "c" )))

but this evaluates to:

# [1] "b" "b" NA  "b" "a" NA  "b" "a"

It doesn't change the NA value in col1 as "c".

The outcome should be:

 #  [1] "b" "b" "c"  "b" "a" "c" "b" "a"

What is the problem in my code? Any suggestion would be appreciated!

Stibu · Accepted Answer

You have to check is.na first, because NA < 4.00e+07 results in NA. If the first argument of ifelse() is NA, the result will be NA as well:

ifelse(c(NA, TRUE, FALSE), "T", "F")
## [1] NA  "T" "F"

As you can see, for the first vector element the result is indeed NA. Even if the other arguments of ifelse() have special code that would take care of this situation, it won't help because that code is never taken into account.

For your example, checking for NA first gives you the desired result:

col4 <- ifelse(is.na(col1), "c",
               ifelse(col1 < 4.00e+07, "a","b"))
col4
## [1] "b" "b" "c" "b" "a" "c" "b" "a"

how to deal with missing value in if else statement?

Answers (2)

Related Questions