Reputation: 10199
I have a df as below with 3 columns. I would like to add new column code
which encoded the column action
based on two other columns.
here is sudo code how to generate code
column
for each user_id
for each session
if action V comes before P then Code column value is VP
if action P and no V then Code column value is P
if action P comes before V then Code column value is PV
df <- read.table(text="
user_id session action
1 1 P
1 1 N
1 2 V
1 2 P
1 2 V
2 1 N
2 1 V
2 1 V
2 1 P
2 2 P", header=T)
so my result will be:
df
user_id session action Code
1 1 P P
1 1 N P
1 2 V VPV
1 2 P VPV
1 2 V VPV
2 1 N VP
2 1 V VP
2 1 V VP
2 1 P VP
2 2 P P
no Code should be longer than VPV and PVP, so we should not have VPVV or PVPV
Upvotes: 1
Views: 108
Reputation: 206167
Here we can write a little helper function to get the code
get_code <- function(x, keep=c("P","V"), max_len=3) {
as.character(x[x %in% keep]) %>%
{rle(.)$values} %>%
paste(collapse="") %>%
substr(1, max_len)
}
df %>%
group_by(user_id, session) %>%
mutate(code=get_code(action))]
# user_id session action code
# <int> <int> <fct> <chr>
# 1 1 1 P P
# 2 1 1 N P
# 3 1 2 V VPV
# 4 1 2 P VPV
# 5 1 2 V VPV
# 6 2 1 N VP
# 7 2 1 V VP
# 8 2 1 V VP
# 9 2 1 P VP
# 10 2 2 P P
The rle
helps to get the sequence of unique values without duplicates, then we paste those together in the order that they appeared.
Upvotes: 3