Reputation: 475
Hi guys I have this Dataframe in R:
m2 <- c(22,NA,0,NA,42,NA)
m3 <- c(89,38,0,67,0,NA)
df = data.frame(m2,m3)
I want to calculate the return between m3 and m2. The formular is: return = (m2[i] - m3[i])/m3[i]. Conditions to calculate is:
So far I have tried these code:
for (i in nrow(df)){
if (is.na(df[['m2']][i]) == TRUE | is.na(df[['m3']][i]) == TRUE){df[['result']][i] = NA}
if (df[['m2']][i] == 0 & df[['m3']][i] == 0) {df[['result']][i] = 9999}
if (df[['m3']][i] == 0 | df[['m2']][i] != 0) {df[['result']][i] = -9999}
else {df[['result']][i] = (df[['m2']][i] - df[['m3']][i])/df[['m3']][i]}
}
But it returns as shown below:
Error in if (df[["m2"]][i] == 0 & df[["m3"]][i] == 0) { :
missing value where TRUE/FALSE need
I have tried the same method for python and it works. Is there any way to do it in R and should I calculate the returns without using for loop?
Upvotes: 0
Views: 371
Reputation:
What your logic really is stating is that you want to change:
NaN to -9999 (occurs for 0/0)
Inf to 9999 (occurs for x/0)
So you can simply apply your formula and then substitute later. It just seems a little less convoluted to me to look at. I avoid 'if-then' logic whenever possible.
base R solution:
df$return <- (df$m2 - df$m3) / df$m3
df[is.nan(df$return),"return"] <- -9999
df[is.infinite(df$return),"return"] <- 9999
dplyr solution:
library(dplyr)
df %>%
mutate(return = (m2 - m3) / m3,
return = if_else(is.nan(return), -9999, return),
return = if_else(is.infinite(return), 9999, return))
Upvotes: 1
Reputation: 14774
If you want readability, case_when
from dplyr
could be an option:
library(dplyr)
df %>%
mutate(
result = case_when(
is.na(m2) | is.na(m3) ~ NA_real_,
m2 == 0 & m3 == 0 ~ 9999,
m2 != 0 & m3 == 0 ~ -9999,
TRUE ~ (m2 - m3) / m3
)
)
As @markus added, you could indeed skip the first line to get the same output.
I'd also suggest to read ?case_when
help page to get acquainted with some specifics (such as the relevance of order, skipping TRUE
, why using NA_real_
in the above case, etc.).
Upvotes: 3
Reputation: 69231
I would break this two steps:
m2 <- c(22,NA,0,NA,42,NA)
m3 <- c(89,38,0,67,0,NA)
df = data.frame(m2,m3)
df$return <- with(df, (m2 - m3)/m3)
df$return <- with(df, ifelse(m2 == 0 & m3 == 0, -9999, ifelse(m2 != 0 & m3 == 0, 9999, return)))
Created on 2019-01-24 by the reprex package (v0.2.1)
The things to note here include 1) use ifelse()
since it is vectorized (i.e. will operate on all rows of df naturally, avoiding having to code a for loop, and 2) R will naturally yield NA
if m2
or m3
are NA
, so you can just spell out the conditions for when return
should equal 9999
or -9999
.
Upvotes: 2
Reputation:
m2 <- c(22,NA,0,NA,42,NA)
m3 <- c(89,38,0,67,0,NA)
df = data.frame(m2,m3)
library(tidyverse)
df %>% mutate( return = ifelse(is.na(df$m2)|is.na(df$m3), NA, ifelse(df$m2 == 0 & df$m3 == 0, 9999, ifelse(df$m3 == 0 & df$m2 != 0, -9999, (df$m2 - df$m3)/df$m3))) )
Upvotes: 1