Reputation: 13
I have a dataset showing performance of staff members against a goal. I want to add a rank column to the dataset showing who is performing the best.
The logic I want to use is to
Here is R code to create a sample of my data set:
ID <- c(1,2,3,4,5,6)
PC_achieved <- c(90,100,50,50,50,90)
Goal <- c(500,500,500,500,600,200)
df <- data.frame(ID,PC_achieved,Goal)
I currently use the below code but it gives a unique rank even if a staff member has the same values in both ranking columns.
df <- df %>%
arrange(desc(PC_achieved), desc(Goal)) %>%
mutate(
Rank = min_rank(row_number())
)
I want ID 3 and ID 4 to get the same rank of 5 not for one to be rank 5 and the other 6 when they have both achieved 50% of the same goal.
Upvotes: 0
Views: 42
Reputation: 66880
df |>
arrange(desc(PC_achieved), desc(Goal)) |>
mutate(new_rank = PC_achieved < lag(PC_achieved,1,Inf) | Goal < lag(Goal,1,0),
rank = cumsum(new_rank))
ID PC_achieved Goal new_rank rank
1 2 100 500 TRUE 1
2 1 90 500 TRUE 2
3 6 90 200 TRUE 3
4 5 50 600 TRUE 4
5 3 50 500 TRUE 5
6 4 50 500 FALSE 5
Another (kinda hacky) way to rank based on two variables would be to make a combined variable that weights the primary variable much more, such that the secondary variable is counted but never more than the primary one. Quick and dirty but doesn't expand well to more variables, and could be wrong if you ever want to rank based on very tiny changes in the primary variable.
df |>
mutate(rank = min_rank(desc(PC_achieved) * 1E6 + desc(Goal)))
ID PC_achieved Goal rank
1 1 90 500 2
2 2 100 500 1
3 3 50 500 5
4 4 50 500 5
5 5 50 600 4
6 6 90 200 3
Upvotes: 1
Reputation: 146070
As the ranking functions don't typically take more than one argument, I would do it like this:
df %>%
arrange(desc(PC_achieved), desc(Goal)) %>%
mutate(
across(c(PC_achieved, Goal), \(x) rank(desc(x)), .names = "rank_{col}")
) |>
group_by(across(starts_with("rank_"))) |>
mutate(
result = cur_group_id()
) |>
ungroup()
# A tibble: 6 × 6
# ID PC_achieved Goal rank_PC_achieved rank_Goal result
# <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 2 100 500 1 3.5 1
# 2 1 90 500 2.5 3.5 2
# 3 6 90 200 2.5 6 3
# 4 5 50 600 5 1 4
# 5 3 50 500 5 3.5 5
# 6 4 50 500 5 3.5 5
To remove the helper rank columns, stick |> select(-starts_with("rank_"))
at the end.
Upvotes: 0