Reputation: 117
I have a dataframe like the following:
df <- data.frame("id" = c(111,111,111,111,222,222,222,222,222,333,333,333),
"Encounter" = c(1,2,3,4,1,2,3,4,5,1,2,3),
"Level" = c(1,1,2,3,3,4,1,2,3,3,4,4),
"Gap_Days" = c(NA,3,2,15,NA,1,18,3,2,NA,77,1))
df
id Encounter Level Gap_Days
1 111 1 1 NA
2 111 2 1 3
3 111 3 2 2
4 111 4 3 15
5 222 1 3 NA
6 222 2 4 1
7 222 3 1 18
8 222 4 2 3
9 222 5 3 2
10 333 1 3 NA
11 333 2 4 77
12 333 3 4 1
Where Level is a numeric signaling a numeric signaling the type of encounter and Gap_Days is the number of days since the previous encounter, and is thus NA
for the first encounter in each id group.
I'm looking to create a variable, "Session", that will start at 1 for the first Encounter within an id group, and increase sequentially when a Level fails to increase from the previous encounter, or when it takes more than 3 days between encounters. Basically it is considered a new "Session" each time these conditions aren't met for an Encounter. I'd like to do this within each group, ideally resulting in something like:
df2 <- data.frame("id" = c(111,111,111,111,222,222,222,222,222,333,333,333),
"Encounter" = c(1,2,3,4,1,2,3,4,5,1,2,3),
"Level" = c(1,1,2,3,3,4,1,2,3,3,4,4),
"Gap_Days" = c(NA,3,2,15,NA,1,18,3,2,NA,77,1),
"Session" = c(1,2,2,3,1,1,2,2,2,1,2,3))
df2
id Encounter Level Gap_Days Session
1 111 1 1 NA 1
2 111 2 1 3 2
3 111 3 2 2 2
4 111 4 3 15 3
5 222 1 3 NA 1
6 222 2 4 1 1
7 222 3 1 18 2
8 222 4 2 3 2
9 222 5 3 2 2
10 333 1 3 NA 1
11 333 2 4 77 2
12 333 3 4 1 3
In the actual data there are no strict limits to the number of Encounters or Sessions within each group. The first encounter can begin at any level, and it is not necessary that the level only increase by 1 i.e. if the level increased from 1 to 4 between encounters that could still be considered the same Session.
I'd prefer a dplyr
solution, but am open to any ideas to help accomplish this!
Upvotes: 0
Views: 64
Reputation: 50738
You can do the following
library(dplyr)
df %>% group_by(id) %>% mutate(Session = cumsum(c(T, diff(Level) == 0) | Gap_Days > 3))
## A tibble: 12 x 5
## Groups: id [3]
# id Encounter Level Gap_Days Session
# <dbl> <dbl> <dbl> <dbl> <int>
# 1 111 1 1 NA 1
# 2 111 2 1 3 2
# 3 111 3 2 2 2
# 4 111 4 3 15 3
# 5 222 1 3 NA 1
# 6 222 2 4 1 1
# 7 222 3 1 18 2
# 8 222 4 2 3 2
# 9 222 5 3 2 2
#10 333 1 3 NA 1
#11 333 2 4 77 2
#12 333 3 4 1 3
You probably want to ungroup
afterwards.
Upvotes: 3