Long_NgV
Long_NgV

Reputation: 475

Calculation with multiple conditions

Hi guys I have this Dataframe in R:

m2 <- c(22,NA,0,NA,42,NA)
m3 <- c(89,38,0,67,0,NA)
df = data.frame(m2,m3)

I want to calculate the return between m3 and m2. The formular is: return = (m2[i] - m3[i])/m3[i]. Conditions to calculate is:

  1. if m2[i] = NA or m3[i] = NA then the result = NA
  2. if m2[i] = 0 and m3[i] = 0 then the result = -9999
  3. if m2[i] != 0 and m3[i] = 0 then the result = 9999

So far I have tried these code:

   for (i in nrow(df)){
      if (is.na(df[['m2']][i]) == TRUE | is.na(df[['m3']][i]) == TRUE){df[['result']][i] = NA}
      if (df[['m2']][i] == 0 & df[['m3']][i] == 0) {df[['result']][i] = 9999}
      if (df[['m3']][i] == 0 | df[['m2']][i] != 0) {df[['result']][i] = -9999}
      else {df[['result']][i] = (df[['m2']][i] - df[['m3']][i])/df[['m3']][i]}
    }

But it returns as shown below:

 Error in if (df[["m2"]][i] == 0 & df[["m3"]][i] == 0) { : 
  missing value where TRUE/FALSE need

I have tried the same method for python and it works. Is there any way to do it in R and should I calculate the returns without using for loop?

Upvotes: 0

Views: 371

Answers (4)

user10917479
user10917479

Reputation:

What your logic really is stating is that you want to change:

NaN to -9999 (occurs for 0/0)
Inf to 9999 (occurs for x/0)

So you can simply apply your formula and then substitute later. It just seems a little less convoluted to me to look at. I avoid 'if-then' logic whenever possible.

base R solution:

df$return <- (df$m2 - df$m3) / df$m3
df[is.nan(df$return),"return"] <- -9999
df[is.infinite(df$return),"return"] <- 9999

dplyr solution:

library(dplyr)

df %>%
  mutate(return = (m2 - m3) / m3,
         return = if_else(is.nan(return), -9999, return),
         return = if_else(is.infinite(return), 9999, return))

Upvotes: 1

arg0naut91
arg0naut91

Reputation: 14774

If you want readability, case_when from dplyr could be an option:

library(dplyr)

df %>%
  mutate(
    result = case_when(
      is.na(m2) | is.na(m3) ~ NA_real_,
      m2 == 0 & m3 == 0 ~ 9999,
      m2 != 0 & m3 == 0 ~ -9999,
      TRUE ~ (m2 - m3) / m3
    )
  )

As @markus added, you could indeed skip the first line to get the same output.

I'd also suggest to read ?case_when help page to get acquainted with some specifics (such as the relevance of order, skipping TRUE, why using NA_real_ in the above case, etc.).

Upvotes: 3

Chase
Chase

Reputation: 69231

I would break this two steps:

m2 <- c(22,NA,0,NA,42,NA)
m3 <- c(89,38,0,67,0,NA)
df = data.frame(m2,m3)


df$return <- with(df, (m2 - m3)/m3)
df$return <- with(df, ifelse(m2 == 0 & m3 == 0, -9999, ifelse(m2 != 0 & m3 == 0, 9999, return)))

Created on 2019-01-24 by the reprex package (v0.2.1)

The things to note here include 1) use ifelse() since it is vectorized (i.e. will operate on all rows of df naturally, avoiding having to code a for loop, and 2) R will naturally yield NA if m2 or m3 are NA, so you can just spell out the conditions for when return should equal 9999 or -9999.

Upvotes: 2

user10624646
user10624646

Reputation:

m2 <- c(22,NA,0,NA,42,NA)
m3 <- c(89,38,0,67,0,NA)
df = data.frame(m2,m3)


library(tidyverse)

df %>%  mutate( return = ifelse(is.na(df$m2)|is.na(df$m3), NA, ifelse(df$m2 == 0 & df$m3 == 0, 9999, ifelse(df$m3 == 0 & df$m2 != 0, -9999, (df$m2 - df$m3)/df$m3))) )

Upvotes: 1

Related Questions