Johanna
Johanna

Reputation: 1039

Summing multiple columns in R based on multiple other columns

This is a sample of my data:

row   Grade1   Grade2   Grade3   Points1   Points2   Points3
1     6        4.5      NA       7         5         4
2     NA       7        8        7         5         4

Per row, I want the sum of the Points columns, whenever the corresponding outcome in the grades column is 5.5 or higher. So, for row 1, the outcome would be 7 (only Grade1 is 5.5 or higher, and Grade1 corresponds to Points1). For row 2, the outcome would be 9: Grade2 and Grade 3 are 5.5 or higher, so we sum Points2 (5) and Points3 (4).

How can I get R to do this?

Upvotes: 0

Views: 45

Answers (1)

stas g
stas g

Reputation: 1513

dat <- data.frame(Grade1 = c(6, NA), #create data
  Grade2 = c(4.5, 7), Grade3 = c(NA, 8), Points1 = c(7, 7), 
  Points2 = c(5, 5), Points3 = c(4, 4))

R matrix-wise operations are your friends. dat[, c('Grade1', 'Grade2', 'Grade3')] > 5.5 is a matrix of TRUE/FALSE/NA for condition "grade > 5.5", which we then multiply by a matrix of grades.

tmp <- (dat[, c('Grade1', 'Grade2', 'Grade3')] > 5.5) *
 dat[, c('Points1', 'Points2', 'Points3')]

what we get is a matrix where only points corresponding to grades > 5.5 survive:

tmp
  #Points1 Points2 Points3
#1       7       0      NA
#2      NA       5       4

finally sum the surviving points to get the desired result:

rowSums(tmp, na.rm = TRUE)
# [1] 7 9

Upvotes: 1

Related Questions