Cina
Cina

Reputation: 10199

creating a new column based on condition of other columns in R

I have a df as below with 3 columns. I would like to add new column code which encoded the column action based on two other columns. here is sudo code how to generate code column

for each user_id  
    for each session
        if action V comes before P then Code column value is VP
        if action P and no V then Code column value is P
        if action P comes before V then Code column value is PV 
df <- read.table(text="
user_id  session   action
1          1         P
1          1         N
1          2         V
1          2         P         
1          2         V
2          1         N
2          1         V
2          1         V
2          1         P
2          2         P", header=T)

so my result will be:

df
user_id  session   action   Code
1          1         P       P
1          1         N       P
1          2         V       VPV
1          2         P       VPV  
1          2         V       VPV
2          1         N       VP
2          1         V       VP
2          1         V       VP
2          1         P       VP
2          2         P       P

no Code should be longer than VPV and PVP, so we should not have VPVV or PVPV

Upvotes: 1

Views: 108

Answers (1)

MrFlick
MrFlick

Reputation: 206167

Here we can write a little helper function to get the code

get_code <- function(x, keep=c("P","V"), max_len=3) {
  as.character(x[x %in% keep]) %>% 
    {rle(.)$values} %>% 
    paste(collapse="") %>% 
    substr(1, max_len)
}

df %>% 
  group_by(user_id, session) %>% 
  mutate(code=get_code(action))]
#    user_id session action code 
#      <int>   <int> <fct>  <chr>
#  1       1       1 P      P    
#  2       1       1 N      P    
#  3       1       2 V      VPV  
#  4       1       2 P      VPV  
#  5       1       2 V      VPV  
#  6       2       1 N      VP   
#  7       2       1 V      VP   
#  8       2       1 V      VP   
#  9       2       1 P      VP   
# 10       2       2 P      P

The rle helps to get the sequence of unique values without duplicates, then we paste those together in the order that they appeared.

Upvotes: 3

Related Questions