Reputation: 8474
I have quite a simple question which I am currently struggling with. If I have an example dataframe:
a <- c(1:5)
b <- c(1,3,5,9,11)
df1 <- data.frame(a,b)
How do I create a new column ('c') which is then populated using if statements on column b. For example: 'cat' for those values in b which are 1 or 2 'dog' for those values in b which are between 3 and 5 'rabbit' for those values in b which are greater than 6
So column 'c' using dataframe df1 would read: cat, dog, dog, rabbit, rabbit.
Many thanks in advance.
Upvotes: 5
Views: 20800
Reputation: 263301
dfrm$dc <- c("dog", "cat", "rabbit")[ findInterval(dfrm$b, c(1, 2.5, 5.5, Inf)) ]
The findInterval approach will be much faster than nested ifelse
strategies, and I'm guessing very much faster than a function that loops over unnested if
statements. Those of us working with bigger data do notice the differences when we pick inefficient algorithms.
This didn't actually address the request, but I don't always think that new users of R will know the most expressive or efficient approach to problems. A request to "use IF" sounded like an effort to translate coding approaches typical of the two major macro statistical processors SPSS and SAS. The R if
control structure is not generally an efficient approach to recoding a column since the argument to its first position will only get evaluated for the first element. On its own it doesn't process a column, whereas the ifelse
function will do so. The cut
function might have been used here (with appropriate breaks
and labels
parameters) , although it would have delivered a factor
-value instead of a character value. The findInterval
approach was chosen for its ability to return multiple levels (which a single ifelse
cannot). I think chaining or nesting ifelse
's becomes quickly ugly and confusing after about 2 or 3 levels of nesting.
Upvotes: 6
Reputation: 44638
Although ifelse() is useful, sometimes it doesn't provide what one would intuitively expect. So, I like to write it out.
a <- c(1:5)
b <- c(1,3,5,9,11)
df1 <- data.frame(a,b)
species <- function(x) {
if(x == 1 | x == 2) y <- "cat"
if(x > 2 & x < 6) y <- "dog"
if(x > 6) y <- "rabbit"
return(y)
}
df1$c <- sapply(df1$b,species)
Upvotes: 2
Reputation: 6104
df1 <-
transform(
df1 ,
c =
ifelse( b %in% 1:2 , 'cat' ,
ifelse( b %in% 3:5 , 'dog' , 'rabbit' ) ) )
Upvotes: 2