poeticpersimmon
poeticpersimmon

Reputation: 179

What is the simplest way to recode a variable based on conditions of another variable in R?

Silly example df, "cat":

species color tail_length
calico  brown     6
calico  gray      6
tabby   multi     5
tabby   brown     5

Suppose I want to create a new variable, personality. The values here will be recoded based on tail_length, but will also be conditional upon the species and color of the cat. So the ideal final df would look like this:

species color tail_length personality
calico  brown     6          mean
calico  gray      6          nice
tabby   multi     5          mean
tabby   brown     5          nice

At present, I'm using the codes:

library(car)
cat$personality<-recode(cat$tail_length, "'6'==mean, '5'==nice")
cat$personality[cat$species=="calico" & cat$color=="brown"] <- mean
cat$personality[cat$species=="calico" & cat$color=="gray"] <- nice
cat$personality[cat$species=="tabby" & cat$color=="multi"]<- mean
cat$personality[cat$species=="tabby" & cat$color=="brown"]<-nice

My main question is this: is there a simpler way to do this/consolidate these functions into one? Given that I made up this example data on the fly, please take it with a grain of salt when answering. Thanks! As an R beginner, I really appreciate your help.

Upvotes: 3

Views: 2945

Answers (3)

Tyler Rinker
Tyler Rinker

Reputation: 109864

Here's one approach using qdap and qdapTools (CRAN packages that I maintain):

library(qdap); library(qdapTools)

key <- list(
    mean = c( "calico.gray", "tabby.brown"),
    nice = c("calico.brown", "tabby.multi")
)

dat[["personality"]] <- paste2(dat[1:2]) %l% key
dat

##   species color tail_length personality
## 1  calico brown           6        nice
## 2  calico  gray           6        mean
## 3   tabby multi           5        nice
## 4   tabby brown           5        mean

Basically you create a key that's a named list based on the combined columns. Then %l% acts as a hash table lookup.

Upvotes: 1

IRTFM
IRTFM

Reputation: 263332

This is really just a merge operation. (Furthermore, you have over specified the criteria since species and tail_length are completely dependent. But since it's only an example that may not be an issue.) Let's say your first dataframe is dat and the criteria dataframe is lookup. Then all you need to do is:

> merge(dat, lookup)
  species color tail_length personality
1  calico brown           6        mean
2  calico  gray           6        nice
3   tabby brown           5        nice
4   tabby multi           5        mean

Not a very interesting or dramatic result because it looks just like the lookup dataframe, but give it something a bit larger and:

> merge( rbind(dat,dat,dat) , lookup)
   species color tail_length personality
1   calico brown           6        mean
2   calico brown           6        mean
3   calico brown           6        mean
4   calico  gray           6        nice
5   calico  gray           6        nice
6   calico  gray           6        nice
7    tabby brown           5        nice
8    tabby brown           5        nice
9    tabby brown           5        nice
10   tabby multi           5        mean
11   tabby multi           5        mean
12   tabby multi           5        mean

Upvotes: 0

shadowtalker
shadowtalker

Reputation: 13833

There isn't much you can do here because, at the end of the day, you still need to specify the conditions and new variables to assign.

However, you can cut down on the boilerplate code by using within:

within(cat, {
  personality <- recode(tail_length, "'6'==mean, '5'==nice")
  personality[species == "calico" & color == "brown"] <- "mean"
  personality[species=="calico" & color=="gray"] <- "nice"
  personality[species=="tabby" & color=="multi"] <- "mean"
  personality[species=="tabby" & color=="brown"] <- "nice"
})

Upvotes: 0

Related Questions