KT_1
KT_1

Reputation: 8474

Populate a column using if statements in r

I have quite a simple question which I am currently struggling with. If I have an example dataframe:

a <- c(1:5)  
b <- c(1,3,5,9,11)
df1 <- data.frame(a,b)

How do I create a new column ('c') which is then populated using if statements on column b. For example: 'cat' for those values in b which are 1 or 2 'dog' for those values in b which are between 3 and 5 'rabbit' for those values in b which are greater than 6

So column 'c' using dataframe df1 would read: cat, dog, dog, rabbit, rabbit.

Many thanks in advance.

Upvotes: 5

Views: 20800

Answers (3)

IRTFM
IRTFM

Reputation: 263301

dfrm$dc <- c("dog", "cat", "rabbit")[ findInterval(dfrm$b, c(1, 2.5, 5.5, Inf)) ]

The findInterval approach will be much faster than nested ifelse strategies, and I'm guessing very much faster than a function that loops over unnested if statements. Those of us working with bigger data do notice the differences when we pick inefficient algorithms.

This didn't actually address the request, but I don't always think that new users of R will know the most expressive or efficient approach to problems. A request to "use IF" sounded like an effort to translate coding approaches typical of the two major macro statistical processors SPSS and SAS. The R if control structure is not generally an efficient approach to recoding a column since the argument to its first position will only get evaluated for the first element. On its own it doesn't process a column, whereas the ifelse function will do so. The cut function might have been used here (with appropriate breaks and labels parameters) , although it would have delivered a factor-value instead of a character value. The findInterval approach was chosen for its ability to return multiple levels (which a single ifelse cannot). I think chaining or nesting ifelse's becomes quickly ugly and confusing after about 2 or 3 levels of nesting.

Upvotes: 6

Brandon Bertelsen
Brandon Bertelsen

Reputation: 44638

Although ifelse() is useful, sometimes it doesn't provide what one would intuitively expect. So, I like to write it out.

a <- c(1:5)  
b <- c(1,3,5,9,11)
df1 <- data.frame(a,b)

species <- function(x) { 
if(x == 1 | x == 2) y <- "cat"
if(x > 2 & x < 6) y <- "dog"
if(x > 6) y <- "rabbit"
return(y)
}

df1$c <- sapply(df1$b,species)

Upvotes: 2

Anthony Damico
Anthony Damico

Reputation: 6104

df1 <- 
    transform(
        df1 ,
        c =
            ifelse( b %in% 1:2 , 'cat' ,
            ifelse( b %in% 3:5 , 'dog' , 'rabbit' ) ) )

Upvotes: 2

Related Questions