Calculation with multiple column conditional on data being there

Question

I am trying to solve is how to calculate the weighted score for each class each month.

Each class has multiple students and the weight (contribution) of a student's score varies through time.

To be included in the calculation a student must have both score and weight.

I am a bit lost and none of the approaches I have used have worked.

 Student Class Jan_18_score Feb_18_score Jan_18_Weight Feb_18_Weight
    Adam   1       3             2           150           153 
    Char   1       5             7           30            60
    Fred   1       -7            8           NA            80
    Greg   1       2             NA          80            40
    Ed     2       1             2           60            80
    Mick   2       NA            6           80            30
    Dave   3       5             NA          40            25
    Nick   3       8             8           12            45  
    Tim    3       -2            7           23            40 
    George 3       5             3           65            NA
    Tom    3       NA            8           78            50

The overall goal is to calculate the weighted score for each class each month.

Taking Class 1 (first 4 rows) as an example and looking at Jan_18.

-The observations of Adam, Char and Greg are valid since they have both scores and weights. Their scores and weights should be included - Fred does not have a Jan_18_weight, therefore both his Jan_18_score and Jan_18_weight are excluded from the calculation.

The following calculation should then occur:

= [(3*150)+(5*30)+(2*80)]/ [150+30+80] = 2.92307

This calculation would be repeated for each class and each month.

A new dataframe something like the following should be the output

  Class Jan_18_Weight_Score Feb_18_Weight_Score 
    1            2.92307           etc
    2             etc              etc
    3             etc              etc

There are many columns and many rows.

Any help is appreciated.

Ronak Shah · Accepted Answer

Maybe this could be solved in a much better way but here is one Base R option where we perform aggregation twice and then combine the results.

#Separate score and weight columns
score_cols <- grep("score$", names(df))
weight_cols <- grep("Weight$", names(df))

#Replace NA's in corresponding score and weight columns to 0
inds <- is.na(df[score_cols]) | is.na(df[weight_cols]) 
df[score_cols][inds] <- 0
df[weight_cols][inds] <- 0


#Find sum of weight columns for each class
df1 <- aggregate(.~Class, cbind(df["Class"], df[weight_cols]), sum)

#find sum of multiplication of score and weight columns for each class
df2 <- aggregate(.~Class, cbind(df["Class"], df[score_cols] * df[weight_cols]), sum)

#Get the ratio between two dataframes.
cbind(df1[1], df2[-1]/df1[-1])

#  Class Jan_18_score Feb_18_score
#1     1         2.92         4.66
#2     2         1.00         3.09
#3     3         4.11         7.70

Calculation with multiple column conditional on data being there

Answers (2)

Related Questions