Hardik Gupta
Hardik Gupta

Reputation: 4790

Rank on every change of value in R

I have a data frame like this (data frame reproducible)

dt <- read.table(text = "Email  Level
                         abc    Level_6
                         abc    Level_6
                         abc    Level_6
                         abc    Level_6
                         abc    Level_6
                         xyz    Level_5
                         xyz    Level_5
                         xyz    Level_2
                         xyz    Level_2
                         xyz    Level_3
                         pqr    Level_1
                         pqr    Level_4
                         pqr    Level_5
                         pqr    Level_5
                         pqr    Level_1", header = T)

> dt
   Email   Level
1    abc Level_6
2    abc Level_6
3    abc Level_6
4    abc Level_6
5    abc Level_6
6    xyz Level_5
7    xyz Level_5
8    xyz Level_2
9    xyz Level_2
10   xyz Level_3
11   pqr Level_1
12   pqr Level_4
13   pqr Level_5
14   pqr Level_5
15   pqr Level_1

I want to add a new column Rank, where rank starts from 1, and changes only when there is change in column Level, per Email id. If the value does not change Rank will continue with the previous value

Hence expected output is

> dt_expected
    Email   Level Rank
1    abc  Level_6    1
2    abc  Level_6    1
3    abc  Level_6    1
4    abc  Level_6    1
5    abc  Level_6    1

6    xyz  Level_5    1
7    xyz  Level_5    1
8    xyz  Level_2    2
9    xyz  Level_2    2
10   xyz  Level_3    3

11   pqr  Level_1    1
12   pqr  Level_4    2
13   pqr  Level_5    3
14   pqr  Level_5    3
15   pqr  Level_1    4

How can I achieve this in data table?

Upvotes: 3

Views: 92

Answers (1)

akrun
akrun

Reputation: 887158

We group by 'Email' and get the run-length-id of the 'Level' column that increments value by checking the adjacent elements of the column

library(data.table)
library(dplyr)
dt %>%
   group_by(Email) %>%
   mutate(Rank = rleid(Level))
# A tibble: 15 x 3
# Groups:   Email [3]
#   Email Level    Rank
#   <fct> <fct>   <int>
# 1 abc   Level_6     1
# 2 abc   Level_6     1
# 3 abc   Level_6     1
# 4 abc   Level_6     1
# 5 abc   Level_6     1
# 6 xyz   Level_5     1
# 7 xyz   Level_5     1
# 8 xyz   Level_2     2
# 9 xyz   Level_2     2
#10 xyz   Level_3     3
#11 pqr   Level_1     1
#12 pqr   Level_4     2
#13 pqr   Level_5     3
#14 pqr   Level_5     3
#15 pqr   Level_1     4

Or check the values with the next value of 'Level' create a logical index and get the cumulative sum

dt %>%
  group_by(Email) %>%
  mutate(Rank = 1 + cumsum(Level != lag(Level, default = first(Level))) )

or using data.table

library(data.table)
setDT(dt)[, Rank := rleid(Level), Email]

Or with base R

dt$Rank <- with(dt, as.integer(ave(as.character(Level), Email, 
      FUN = function(x) with(rle(x), rep(seq_along(values), lengths)))))

Upvotes: 2

Related Questions