Reputation: 37
I have a data frame as below:
df <- data.frame(Staff = c("Jack", "Ruth", "Michael", "Ruth", "Jack", "Jack", "Ruth", "Michael"),
Client = c("Julie", "Julie", "Julie", "Julie", "Julie", "Candice", "Candice", "Candice"),
Assessment = c(1, 2, 2, 1, 7, 4, 1, 1),
Staff_avg_by_client = c(4, 1.5, 2, 1.5, 4, 4, 1, 1))
Upvotes: 2
Views: 66
Reputation: 389135
For every Client
we can subtract Staff_avg_by_client
by mean
of Assessment
for all other Staff
. Using dplyr
this can be done as :
library(dplyr)
library(purrr)
df %>%
group_by(Client) %>%
mutate(diff = map_dbl(row_number(),
~Staff_avg_by_client[.x] - mean(Assessment[Staff != Staff[.x]])))
# Month Staff Client Assessment Staff_avg_by_client diff
# <dbl> <chr> <chr> <dbl> <dbl> <dbl>
#1 1 Jack Julie 1 4 2.33
#2 1 Ruth Julie 2 1.5 -1.83
#3 1 Michael Julie 2 2 -0.75
#4 1 Ruth Julie 1 1.5 -1.83
#5 1 Jack Julie 7 4 2.33
#6 1 Jack Candice 4 4 3
#7 1 Ruth Candice 1 1 -1.5
#8 1 Michael Candice 1 1 -1.5
Upvotes: 2
Reputation: 24838
Here's an approach with data.table
(since that's what you asked for):
We can use by = seq(1,nrow(df))
to work on every row.
Then for each row, we can subset df
by that row's Staff
and Client
using the .SD
special symbol. So for row 1
, .SD[,Staff]
evaluates to "Jack"
and .SD[,Client]
evaluates to "Julie"
.
library(data.table)
setDT(df)
df[, Diff := Staff_avg_by_client -
df[Staff != .SD[,Staff] & Client == .SD[,Client], mean(Assessment)],
by = seq(1,nrow(df))][]
Month Staff Client Assessment Staff_avg_by_client Diff
1: 1 Jack Julie 1 4.0 2.333333
2: 1 Ruth Julie 2 1.5 -1.833333
3: 1 Michael Julie 2 2.0 -0.750000
4: 1 Ruth Julie 1 1.5 -1.833333
5: 1 Jack Julie 7 4.0 2.333333
6: 1 Jack Candice 4 4.0 3.000000
7: 1 Ruth Candice 1 1.0 -1.500000
8: 1 Michael Candice 1 1.0 -1.500000
The final []
is just to print the data.table
after assigning by reference.
Upvotes: 1