Reputation: 13

I'm struggling to rank rows by Col1 and then if there are ties by Col2 and if there are still ties to return the same rank

I have a dataset showing performance of staff members against a goal. I want to add a rank column to the dataset showing who is performing the best.

The logic I want to use is to

first sort by PC_achieved (the % of the goal the staff member has achieved to date)
If there are ties then by Goal so that staff members who have achieved the same % but of a larger goal are ranked ahead

Here is R code to create a sample of my data set:

ID <- c(1,2,3,4,5,6)
PC_achieved <- c(90,100,50,50,50,90)
Goal <- c(500,500,500,500,600,200)

df <- data.frame(ID,PC_achieved,Goal)

I currently use the below code but it gives a unique rank even if a staff member has the same values in both ranking columns.

df <- df %>%
  arrange(desc(PC_achieved), desc(Goal)) %>%
  mutate(
    Rank = min_rank(row_number())
  )

I want ID 3 and ID 4 to get the same rank of 5 not for one to be rank 5 and the other 6 when they have both achieved 50% of the same goal.

Upvotes: 0

Answers (2)

Jon Spring

Reputation: 66880

df |>
  arrange(desc(PC_achieved), desc(Goal)) |>
  mutate(new_rank = PC_achieved < lag(PC_achieved,1,Inf) | Goal < lag(Goal,1,0),
         rank = cumsum(new_rank))

  ID PC_achieved Goal new_rank rank
1  2         100  500     TRUE    1
2  1          90  500     TRUE    2
3  6          90  200     TRUE    3
4  5          50  600     TRUE    4
5  3          50  500     TRUE    5
6  4          50  500    FALSE    5

Another (kinda hacky) way to rank based on two variables would be to make a combined variable that weights the primary variable much more, such that the secondary variable is counted but never more than the primary one. Quick and dirty but doesn't expand well to more variables, and could be wrong if you ever want to rank based on very tiny changes in the primary variable.

df |>
   mutate(rank = min_rank(desc(PC_achieved) * 1E6 + desc(Goal)))


  ID PC_achieved Goal rank
1  1          90  500    2
2  2         100  500    1
3  3          50  500    5
4  4          50  500    5
5  5          50  600    4
6  6          90  200    3

Upvotes: 1

Gregor Thomas

Reputation: 146070

As the ranking functions don't typically take more than one argument, I would do it like this:

df %>%
  arrange(desc(PC_achieved), desc(Goal)) %>%
  mutate(
   across(c(PC_achieved, Goal), \(x) rank(desc(x)), .names = "rank_{col}")
  ) |>
  group_by(across(starts_with("rank_"))) |>
  mutate(
    result = cur_group_id()
  ) |>
  ungroup() 
# A tibble: 6 × 6
#      ID PC_achieved  Goal rank_PC_achieved rank_Goal result
#   <dbl>       <dbl> <dbl>            <dbl>     <dbl>  <int>
# 1     2         100   500              1         3.5      1
# 2     1          90   500              2.5       3.5      2
# 3     6          90   200              2.5       6        3
# 4     5          50   600              5         1        4
# 5     3          50   500              5         3.5      5
# 6     4          50   500              5         3.5      5

To remove the helper rank columns, stick |> select(-starts_with("rank_")) at the end.

Upvotes: 0

I&#39;m struggling to rank rows by Col1 and then if there are ties by Col2 and if there are still ties to return the same rank

Answers (2)

Related Questions

I'm struggling to rank rows by Col1 and then if there are ties by Col2 and if there are still ties to return the same rank