Reputation: 4790
I have a data frame like this (data frame reproducible)
dt <- read.table(text = "Email Level
abc Level_6
abc Level_6
abc Level_6
abc Level_6
abc Level_6
xyz Level_5
xyz Level_5
xyz Level_2
xyz Level_2
xyz Level_3
pqr Level_1
pqr Level_4
pqr Level_5
pqr Level_5
pqr Level_1", header = T)
> dt
Email Level
1 abc Level_6
2 abc Level_6
3 abc Level_6
4 abc Level_6
5 abc Level_6
6 xyz Level_5
7 xyz Level_5
8 xyz Level_2
9 xyz Level_2
10 xyz Level_3
11 pqr Level_1
12 pqr Level_4
13 pqr Level_5
14 pqr Level_5
15 pqr Level_1
I want to add a new column Rank, where rank starts from 1, and changes only when there is change in column Level, per Email id. If the value does not change Rank will continue with the previous value
Hence expected output is
> dt_expected
Email Level Rank
1 abc Level_6 1
2 abc Level_6 1
3 abc Level_6 1
4 abc Level_6 1
5 abc Level_6 1
6 xyz Level_5 1
7 xyz Level_5 1
8 xyz Level_2 2
9 xyz Level_2 2
10 xyz Level_3 3
11 pqr Level_1 1
12 pqr Level_4 2
13 pqr Level_5 3
14 pqr Level_5 3
15 pqr Level_1 4
How can I achieve this in data table?
Upvotes: 3
Views: 92
Reputation: 887158
We group by 'Email' and get the run-length-id of the 'Level' column that increments value by checking the adjacent elements of the column
library(data.table)
library(dplyr)
dt %>%
group_by(Email) %>%
mutate(Rank = rleid(Level))
# A tibble: 15 x 3
# Groups: Email [3]
# Email Level Rank
# <fct> <fct> <int>
# 1 abc Level_6 1
# 2 abc Level_6 1
# 3 abc Level_6 1
# 4 abc Level_6 1
# 5 abc Level_6 1
# 6 xyz Level_5 1
# 7 xyz Level_5 1
# 8 xyz Level_2 2
# 9 xyz Level_2 2
#10 xyz Level_3 3
#11 pqr Level_1 1
#12 pqr Level_4 2
#13 pqr Level_5 3
#14 pqr Level_5 3
#15 pqr Level_1 4
Or check the values with the next value of 'Level' create a logical index and get the cumulative sum
dt %>%
group_by(Email) %>%
mutate(Rank = 1 + cumsum(Level != lag(Level, default = first(Level))) )
or using data.table
library(data.table)
setDT(dt)[, Rank := rleid(Level), Email]
Or with base R
dt$Rank <- with(dt, as.integer(ave(as.character(Level), Email,
FUN = function(x) with(rle(x), rep(seq_along(values), lengths)))))
Upvotes: 2