Aruna
Aruna

Reputation: 1

How to create a new column with an incrementing sequence number in an R dataframe, such that it gets incremented based on other column values

This is what my simplified dataframe looks like -

App       IsNewSession
Word         TRUE   
Excel        FALSE   
Chrome       TRUE  
Notepad      FALSE  
Chrome       FALSE  
Notepad      FALSE  
Excel        TRUE  
Chrome       FALSE

I need to create a new column called SessionNumber. Each time IsNewSession = TRUE, the session number should be the previous row's session number + 1. Otherwise, it just retains the same session number as the previous row.

Desired data frame -

App     IsNewSession   SessionNumber
Word     TRUE            1
Excel    FALSE           1
Chrome   TRUE            2
Notepad  FALSE           2
Chrome   FALSE           2 
Notepad  FALSE           2
Excel    TRUE            3
Chrome   FALSE           3

I can do this using a for loop but my dataframe is pretty large (250K rows) and it takes a really long time.

I tried using mutate like this, but that doesn't work either. df$SessionNumber = 1

library(dplyr)

df <- df %>% 
  mutate(SessionNumber = ifelse(IsNewSession, lag(SessionNumber) + 1, lag(SessionNumber)))

What is a good performant way to do this in R?

Thanks!

Upvotes: 0

Views: 338

Answers (1)

troh
troh

Reputation: 1364

The question in the comment doesn't work if the first value is FALSE.

df$SessionNumber <- cumsum(df$IsNewSession) + as.numeric(!df$SessionNumber[1])

Upvotes: 1

Related Questions