houstonsaxe
houstonsaxe

Reputation: 11

Using apply() to perform operations on one column based on information from another column in R

I have a data frame, vinaSums2:

Treatment Dose (ppm)    variable sum sum2 percent score
1                     0       #DFA1  12   19    0.63    NA
2                     0       #DFA2   6   19    0.32    NA
3                     0       #DFA3   0   19    0.00    NA
4                     0       #DFA4   1   19    0.05    NA
5                     0 Bug/Discard   0   19    0.00    NA
6                   150       #DFA1  12   20    0.60    NA
7                   150       #DFA2   5   20    0.25    NA
8                   150       #DFA3   3   20    0.15    NA
9                   150       #DFA4   0   20    0.00    NA
10                  150 Bug/Discard   0   20    0.00    NA
11                  300       #DFA1  14   19    0.74    NA
12                  300       #DFA2   1   19    0.05    NA
13                  300       #DFA3   2   19    0.11    NA
14                  300       #DFA4   1   19    0.05    NA
15                  300 Bug/Discard   1   19    0.05    NA

I would like to transform values of vinaSums2$sum based on their respective values for vinaSums2$variable and place this in vinaSums2$score. I have tried variations of this:

 vinaSums2$score = apply(vinaSums2[,c(2,3)], 1, function(x){ifelse(x[1] == "#DFA1", x[2]*1, ifelse(x[1] == "#DFA2", x[2]*0.8, ifelse(x[1] == "#DFA3", x[2]*0.6, ifelse(x[1] == "#DFA4", x[2]*0.4, x[2]*0.2))))})

Which results in the error:

Error during wrapup: non-numeric argument to binary operator
Error: no more error handlers available (recursive errors?); invoking 'abort' restart

I don't understand how apply() is working with the function and the data. x[1] should return the "variable" for that row and x[2] should return the score for that row in my mind. But I obviously don't know what apply() is doing under the hood.

Any help would be appreciated

Upvotes: 1

Views: 40

Answers (2)

Onyambu
Onyambu

Reputation: 79228

In Base R, you could do:

vec<-c("#DFA1" = 1, "#DFA2"=0.8, "#DFA3"=0.6, "#DFA4" = 0.4) 
vs <- vec[vinaSums2$variable]
vs[is.na(vs)] <- 0.2
vinaSums2$score <- vs * vinaSums2$sum

vinaSums2
   Treatment.Dose..ppm.    variable sum sum2 percent score
1                     0       #DFA1  12   19    0.63  12.0
2                     0       #DFA2   6   19    0.32   4.8
3                     0       #DFA3   0   19    0.00   0.0
4                     0       #DFA4   1   19    0.05   0.4
5                     0 Bug/Discard   0   19    0.00   0.0
6                   150       #DFA1  12   20    0.60  12.0
7                   150       #DFA2   5   20    0.25   4.0
8                   150       #DFA3   3   20    0.15   1.8
9                   150       #DFA4   0   20    0.00   0.0
10                  150 Bug/Discard   0   20    0.00   0.0
11                  300       #DFA1  14   19    0.74  14.0
12                  300       #DFA2   1   19    0.05   0.8
13                  300       #DFA3   2   19    0.11   1.2
14                  300       #DFA4   1   19    0.05   0.4
15                  300 Bug/Discard   1   19    0.05   0.2

Using tidyverse you could do:

library(tidyverse)
vec<-c("#DFA1" = 1, "#DFA2"=0.8, "#DFA3"=0.6, "#DFA4" = 0.4) 
vinaSums2 %>%
  mutate(v = factor(variable, names(vec), vec),
         v = as.numeric(as.character(fct_explicit_na(v, "0.2"))),
         score = v * sum)%>%
  select(-v)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

You don't need apply since ifelse is vectorized.

vinaSums2$score <- with(vinaSums2, ifelse(variable == "#DFA1", sum, 
                        ifelse(variable == "#DFA2", sum*0.8, 
                           ifelse(variable == "#DFA3", sum*0.6, 
                              ifelse(variable == "#DFA4", sum*0.4, sum*0.2)))))

You can also use case_when from dplyr which will be helpful if you have lot of such conditions.

library(dplyr)

vinaSums2 %>%
  mutate(score = case_when(variable == "#DFA1" ~ sum * 1, 
                           variable == "#DFA2"~sum*0.8,
                           variable == "#DFA3"~sum*0.6, 
                           variable == "#DFA4" ~ sum * 0.4, 
                           TRUE ~ sum * 0.2))

Upvotes: 2

Related Questions