Jeremy
Jeremy

Reputation: 43

writing an R function with if statement that relies on multiple columns of a dataframe

I'm trying to write an R function that consists of an if statement where when a condition in column A is true, it does a calculation to a value in column B, otherwise it just returns the value from column B. I'm sure this is easy to do and I'm just missing something basic, but I am struggling. Is there a good way to do this?

Here's an example of what I tried

example_df <- data.frame(
  type = c("oranges", "apples", "oranges", "oranges", "apples"),
  sold = c(6, 7, 1, 4, 1)
)

multiply_oranges <- function(x) { if (x$type == "oranges") {
    x$sold * 10
  } else {
    x$sold
  }
}
lapply(example_df, multiply_oranges)

But that gives me

Error: $ operator is invalid for atomic vectors

and I'm having trouble understanding what that means/how to fix it.

Any help in either fixing this function or showing me a better way to do this would be much appreciated. Thanks!

Upvotes: 0

Views: 52

Answers (2)

Edward
Edward

Reputation: 19484

I wonder if this is what you're after:

library(dplyr)
example_df %>% 
  mutate(Cost=ifelse(type=="oranges", sold*10, sold))
    type sold Cost
1 oranges    6   60
2  apples    7    7
3 oranges    1   10
4 oranges    4   40
5  apples    1    1

But that seems like a lot of effort, especially if you want to add more fruit. You should have another data frame containing the prices for each fruit.

Prices <- data.frame(price=c(10,5), type=c("oranges","apples"))
Prices
  price    type
1    10 oranges
2     5  apples

Then join them together and calculate the net price:

library(tidyr)
example_df %>% 
  inner_join(Prices) %>%
  mutate(Net=sold*price)
Joining, by = "type"
     type sold price Net
1 oranges    6    10  60
2  apples    7     5  35
3 oranges    1    10  10
4 oranges    4    10  40
5  apples    1     5   5

Upvotes: 2

r2evans
r2evans

Reputation: 160952

  1. lapply is not necessary; that would be useful it you have a list of frames (even just one). You don't. The argument that your function is being given is one column at a time. It "unrolls" as:

    multiply_oranges(example_df$type)
    multiply_oranges(example_df$sold)
    

    Which is not what (I think) you intend.

  2. Your if is wrong. R's if requires that its condition be length 1; if it is more, it will warn you with:

    Warning in if (x$type == "oranges") { :
      the condition has length > 1 and only the first element will be used
    

    which is effectively telling you that the value of the first value in $type is being used for all in the vector, which is also (I believe) not what you intend. Instead, use ifelse.

Try this:

multiply_oranges <- function(x) x$sold * ifelse(x$type == "oranges", 10, 1)
multiply_oranges(example_df)
# [1] 60  7 10 40  1

The function ifelse is doing the condition for each element in the vector. If you look at the ifelse by itself, you'll see

x$type == "oranges"
# [1]  TRUE FALSE  TRUE  TRUE FALSE
ifelse(x$type == "oranges", 10, 1)
# [1] 10  1 10 10  1
x$sold * ifelse(x$type == "oranges", 10, 1)
# [1] 60  7 10 40  1

Upvotes: 2

Related Questions