Calculuate the differences between averages in a paired data in R

Question

I have a data frame as below:

df <- data.frame(Staff = c("Jack", "Ruth", "Michael", "Ruth", "Jack", "Jack", "Ruth", "Michael"), 
                 Client = c("Julie", "Julie", "Julie", "Julie", "Julie", "Candice", "Candice", "Candice"),
                 Assessment = c(1, 2, 2, 1, 7, 4, 1, 1), 
                 Staff_avg_by_client = c(4, 1.5, 2, 1.5, 4, 4, 1, 1))

Ian Campbell · Accepted Answer

Here's an approach with data.table (since that's what you asked for):

We can use by = seq(1,nrow(df)) to work on every row.

Then for each row, we can subset df by that row's Staff and Client using the .SD special symbol. So for row 1, .SD[,Staff] evaluates to "Jack" and .SD[,Client] evaluates to "Julie".

library(data.table)
setDT(df)
df[, Diff := Staff_avg_by_client -
     df[Staff != .SD[,Staff] & Client == .SD[,Client], mean(Assessment)],
   by = seq(1,nrow(df))][]
   Month   Staff  Client Assessment Staff_avg_by_client      Diff
1:     1    Jack   Julie          1                 4.0  2.333333
2:     1    Ruth   Julie          2                 1.5 -1.833333
3:     1 Michael   Julie          2                 2.0 -0.750000
4:     1    Ruth   Julie          1                 1.5 -1.833333
5:     1    Jack   Julie          7                 4.0  2.333333
6:     1    Jack Candice          4                 4.0  3.000000
7:     1    Ruth Candice          1                 1.0 -1.500000
8:     1 Michael Candice          1                 1.0 -1.500000

The final [] is just to print the data.table after assigning by reference.

Calculuate the differences between averages in a paired data in R

Answers (2)

Related Questions