ASF
ASF

Reputation: 279

R - subtracting different columns if condition is met

I have a huge data frame that is like:

df = data.frame(A = c(1,54,23,2), B=c(1,2,4,65), C=c("+","-","-","+"))

> df
   A  B C
1  1  1 +
2 54  2 -
3 23  4 -
4  2 65 +

I need to subtract the rows based on different conditions, and add these results in a new column:

 A - B if C == +
 B - A if C == -

So, my output would be:

> new_df
   A  B C   D
1  1  1 +   0
2 54  2 - -52
3 23  4 - -19
4  2 65 + -63

Upvotes: 2

Views: 3339

Answers (6)

moodymudskipper
moodymudskipper

Reputation: 47300

A base solution:

df$D = (df$B-df$A)*sign((df$C=="-")-0.5)
#    A  B C   D
# 1  1  1 +   0
# 2 54  2 - -52
# 3 23  4 - -19
# 4  2 65 + -63

can also be written df <- transform(df, D = (B-A)*sign((C=="-")-0.5))

Upvotes: 0

MathLal
MathLal

Reputation: 402

This answer should work for you https://stackoverflow.com/a/19000310/6395612

You can use with like this:

df['D'] = with(df, ifelse(C=='+', A - B, B - A))

Upvotes: 0

Thomas Guillerme
Thomas Guillerme

Reputation: 1857

Alternatively, if you want to evaluate the arithmetic information in column C (as in addition or subtraction), you can use eval(parse(txt)) (more about that here: Evaluate expression given as a string).

## Transforming into a matrix (simplifies everything into characters)
df_mat <- as.matrix(df)

## Function for evaluation the rows
eval.row <- function(row) {
    eval(parse(text= paste(row[1], row[3], row[2])))
}

## For the first row
eval.row(df_mat[1,])
# [1] 2

## For the whole data frame
apply(df_mat, 1, eval.row)
# [1]  2 52 19 67

## Updating the data.frame
df$D <- apply(df_mat, 1, eval.row)

Upvotes: 0

Matt W.
Matt W.

Reputation: 3722

using dplyr:

If there are definitely only + and - in the C column you can do:

library(dplyr)

df2 <- df %>%
     mutate(D = ifelse(C == '+', A - B, B - A))

I would generally do:

df2 <- df %>%
     mutate(D = ifelse(C == '+', A - B,
                ifelse(C == '-', B - A, NA)))

Just in case there are some that do not have + or -.

Upvotes: 1

neilfws
neilfws

Reputation: 33772

Better to add stringsAsFactors = FALSE when you create a data frame. Also, I don't like to use df for variable names since there is a df() function:

df1 <- data.frame(A = c(1, 54, 23, 2), 
                  B = c(1, 2, 4, 65), 
                  C = c("+", "-", "-", "+"), 
                  stringsAsFactors = FALSE)

Assuming that C is only + or -, you can use dplyr::mutate() and test using ifelse():

library(dplyr)
df1 %>% 
  mutate(D = ifelse(C == "+", A - B, B - A))

Upvotes: 2

www
www

Reputation: 39154

This assumes that only two conditions, + and -, are in column C.

df$D <- with(df, ifelse(C %in% "+", A - B, B - A))
df
#    A  B C   D
# 1  1  1 +   0
# 2 54  2 - -52
# 3 23  4 - -19
# 4  2 65 + -63

Upvotes: 3

Related Questions