Reputation: 33
I am looking for the best way to loop through data and update a certain variable, while grouped on another variables. I feel like I'm very close, but I don't have enough practice with loops in R yet to fully do it. Would appreciate if someone could help me out! It's my first time asking a question on here: I hope the code will be helpful!
studentID <- c(1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3,3,4,4,4,4,4)
lag_time <- c(0,3.8,4.6,2.6,720,3.4,200,780,860,3.5,2.5,3.3,6.68,945,7.5,2.3,1.2,3.2,83456.093,5.3,4.2,56540)
session <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
df <- data.frame(studentID, lag_time, session)
Alright, so what I want to do: I have a dataframe of website logdata arranged by studentID and for each student I want to calculate which session they are currently in. I've already calculated lag_time, which is basically the time between the rows, which indicate a session. If lag_time >= 600, then I want to update the variable 'session' + 1, per studentID. In the end, it should look like this:
studentID lag_time session
1 0 1
1 3.8 1
1 4.6 1
1 2.6 1
1 720 2
2 3.4 1
2 200 1
2 780 2
2 860 3
3 3.5 1
3 2.5 1
3 3.3 1
3 6.68 1
3 945 2
3 7.5 2
3 2.3 2
3 1.2 2
4 3.2 1
4 83456.093 2
4 5.3 2
4 4.2 2
4 56540 3
I hope I explained correctly and looking forward to seeing your suggestions!
Upvotes: 0
Views: 55
Reputation: 389315
You can do this with the help of cumsum
.
Using dplyr
:
library(dplyr)
df %>%
group_by(studentID) %>%
mutate(session = session + cumsum(lag_time >= 600)) %>%
ungroup()
And in base R :
transform(df, session = session + ave(lag_time >= 600, studentID, FUN = cumsum))
# studentID lag_time session
#1 1 0.00 1
#2 1 3.80 1
#3 1 4.60 1
#4 1 2.60 1
#5 1 720.00 2
#6 2 3.40 1
#7 2 200.00 1
#8 2 780.00 2
#9 2 860.00 3
#10 3 3.50 1
#11 3 2.50 1
#12 3 3.30 1
#13 3 6.68 1
#14 3 945.00 2
#15 3 7.50 2
#16 3 2.30 2
#17 3 1.20 2
#18 4 3.20 1
#19 4 83456.09 2
#20 4 5.30 2
#21 4 4.20 2
#22 4 56540.00 3
Upvotes: 1