Reputation: 175
I have a dataframe, mydata, constructed as follows:
col1<-c(8.20e+07, 1.75e+08, NA, 4.80e+07,
3.40e+07, NA, 5.60e+07, 3.00e+06 )
col2<-c(1960,1960,1965,1986,1960
,1969,1960,1993)
col3<-c ( NA,2.190,NA,NA, 5.000, NA,
1.700,4.220)
mydata<-data.frame(col1,col2,col3)
mydata
# col1 col2 col3
# 1 8.20e+07 1960 NA
# 2 1.75e+08 1960 2.19
# 3 NA 1965 NA
# 4 4.80e+07 1986 NA
# 5 3.40e+07 1960 5.00
# 6 NA 1969 NA
# 7 5.60e+07 1960 1.70
# 8 3.00e+06 1993 4.22
I want to create a col4
that has the values "a", "b" and "c"
,
if col1
is smaller than 4.00e+07, then col4=="a"
; if col1
is not less than 4.00e+07, then col4=="b"
, else col4=="c
"
Here is my code:
col4 <-ifelse(col1<4.00e+07, "a",
ifelse(col1 >=4.00e+07, "b",
ifelse(is.na(col1 =4.00e+07), "b", "c" )))
but this evaluates to:
# [1] "b" "b" NA "b" "a" NA "b" "a"
It doesn't change the NA value in col1 as "c".
The outcome should be:
# [1] "b" "b" "c" "b" "a" "c" "b" "a"
What is the problem in my code? Any suggestion would be appreciated!
Upvotes: 2
Views: 10762
Reputation: 15907
You have to check is.na
first, because NA < 4.00e+07
results in NA
. If the first argument of ifelse()
is NA
, the result will be NA
as well:
ifelse(c(NA, TRUE, FALSE), "T", "F")
## [1] NA "T" "F"
As you can see, for the first vector element the result is indeed NA
. Even if the other arguments of ifelse()
have special code that would take care of this situation, it won't help because that code is never taken into account.
For your example, checking for NA
first gives you the desired result:
col4 <- ifelse(is.na(col1), "c",
ifelse(col1 < 4.00e+07, "a","b"))
col4
## [1] "b" "b" "c" "b" "a" "c" "b" "a"
Upvotes: 6
Reputation: 887118
This can be also done with cut
v1 <- with(mydata, as.character(cut(col1,
breaks=c(-Inf, 4.00e+07, Inf), labels=c("a", "b"))))
v1[is.na(v1)] <- "c"
v1
#[1] "b" "b" "c" "b" "a" "c" "b" "a"
Upvotes: 3