IDLHTA
IDLHTA

Reputation: 5

Applying if else statements involving two dataframe columns in R

I am trying to modify a dataframe of two columns, to add a third that returns four possible expressions depending on the contents of the other columns (i.e. whether each is positive or negative).

I have tried a couple of approaches, the 'mutate' function in dplyr as well as sapply. Unfortunately I seem to be missing something as I get the error "the condition has length > 1 and only the first element will be used". So only the first iteration is applied to each row in the new column.

A reproducible example (of the mutate approach I've tried) is as follows:

Costs <- c(2, -5, -7, 3, 12)
Outcomes <- c(-2, 5, -7, 3, -2)

results <- as.data.frame(cbind(Costs, Outcomes))
results

quadrant <- function(cost,outcome) {
        if (costs < 0 &
            outcomes < 0) {
                "SW Quadrant"
        }
        else if (costs<0 & outcomes>0){
                "Dominant"
        } 
        else if (costs>0 & outcomes<0){
                "Dominated"
        }
        else{""}
}


results <- mutate(results,Quadrant = quadrant(Costs,Outcomes)
        )

The full warning message is:

Warning messages: 1: Problem with mutate() input Quadrant. i the condition has length > 1 and only the first element will be used i Input Quadrant is quadrant(results$Costs, results$Outcomes). 2: In if (costs < 0 & outcomes < 0) { : the condition has length > 1 and only the first element will be used 3: Problem with mutate() input Quadrant. i the condition has length > 1 and only the first element will be used i Input Quadrant is quadrant(results$Costs, results$Outcomes). 4: In if (costs < 0 & outcomes > 0) { : the condition has length > 1 and only the first element will be used 5: Problem with mutate() input Quadrant. i the condition has length > 1 and only the first element will be used i Input Quadrant is quadrant(results$Costs, results$Outcomes). 6: In if (costs > 0 & outcomes < 0) { : the condition has length > 1 and only the first element will be used<

My attempt at the sapply function:

results <- sapply(results$Quadrant,quadrant(results$Costs,results$Outcomes))

Leads to the following error, with consistent warning messages to the mutate approach.

Error in get(as.character(FUN), mode = "function", envir = envir) : object 'Dominated' of mode 'function' was not found

I'm sure I'm missing something obvious here. Grateful for any suggestions.

Upvotes: 0

Views: 212

Answers (2)

r2evans
r2evans

Reputation: 160447

There are two things going wrong with that function.

  1. You define the function with cost but use costs (same for outcome);
  2. You use if which strictly requires a logical condition of length 1, and two things wrong: you use & which should almost never be used exposed like this in an if statement, and you are passing vectors, so cost < 0 will return a logical vector the same length of cost (which is greater than 1 here).

Suggestions:

quadrant_sgl <- function(cost, outcome) {
  if (cost < 0 && outcome < 0) return("SW Quadrant")
  if (cost < 0 && outcome > 0) return("Dominant")
  if (cost > 0 && outcome < 0) return("Dominated")
  return("")
}

quadrant_vec1 <- function(cost, outcome) {
  ifelse(cost < 0 & outcome < 0, "SW Quadrant",
         ifelse(cost < 0 & outcome > 0, "Dominant",
                ifelse(cost > 0 & outcome < 0, "Dominated",
                       "")))
}

quadrant_vec2 <- function(cost, outcome) {
  ifelse(cost < 0,
         ifelse(outcome < 0, "SW Quadrant", "Dominant"),
         ifelse(outcome < 0, "Dominated", ""))
}

quadrant_vec3 <- function(cost, outcome) {
  dplyr::case_when(
    cost < 0 & outcome < 0 ~ "SW Quadrant",
    cost < 0 & outcome > 0 ~ "Dominant",
    cost > 0 & outcome < 0 ~ "Dominated",
    TRUE ~ ""
  )
}

quadrant_vec4 <- function(cost, outcome) {
  data.table::fcase(
    cost < 0 & outcome < 0, "SW Quadrant",
    cost < 0 & outcome > 0, "Dominant",
    cost > 0 & outcome < 0, "Dominated",
    rep(TRUE, length(cost)), ""
  )
}

The first function (quadrant_sgl) turns a function that remains single-operation (not vectorized) into a vectorized function. If you aren't familiar with the concept of vectorization, know that (1) R does it well, (2) R prefers it, and (3) this is not the best venue to talk at length about this. Search for "R vectorization" and you should find plenty of material on this.

Because of this, the first one is just a demonstration of what to do when the function cannot (due to time, programming skill, or something else) be converted into a vectorize-friendly function. Use Vectorize.

The other functions are all relatively equivalent.

If you are using dplyr and friends, then I strongly recommend the use of quadrant_vec3, since it is (IMO) much easier to read and maintain than nested ifelses. (BTW: if you must use nested ifelse, then at least use dplyr::if_elses, nested, as they are generally safer than base R's ifelse.)

If you are venturing into the world of data.table, then quadrant_vec4 is the equivalent using data.table's own fcase function, mostly the same as case_when.

Demo:

Vectorize(quadrant_sgl, vectorize.args = c("cost", "outcome"))(results$Costs, results$Outcomes)
# [1] "Dominated"   "Dominant"    "SW Quadrant" ""            "Dominated"  
quadrant_vec1(results$Costs, results$Outcomes)
# [1] "Dominated"   "Dominant"    "SW Quadrant" ""            "Dominated"  
quadrant_vec2(results$Costs, results$Outcomes)
# [1] "Dominated"   "Dominant"    "SW Quadrant" ""            "Dominated"  
quadrant_vec3(results$Costs, results$Outcomes)
# [1] "Dominated"   "Dominant"    "SW Quadrant" ""            "Dominated"  

Upvotes: 2

DaveArmstrong
DaveArmstrong

Reputation: 21937

You probably want something more like:

costs <- c(2, -5, -7, 3, 12)
outcomes <- c(-2, 5, -7, 3, -2)

results <- as.data.frame(cbind(costs, outcomes))

results <- results %>% mutate(Quadrant = case_when(
  outcomes < 0 & costs < 0 ~ "SW Quadrant", 
  costs < 0 & outcomes > 0 ~ "Dominant", 
  costs > 0 & outcomes < 0 ~ "Dominated", 
  TRUE ~ ""))

results
#   costs outcomes    Quadrant
# 1     2       -2   Dominated
# 2    -5        5    Dominant
# 3    -7       -7 SW Quadrant
# 4     3        3            
# 5    12       -2   Dominated

Upvotes: 2

Related Questions