leary9009
leary9009

Reputation: 25

How to count number of days between events in a dataset

I'm trying to restructure my data to recode a variable ('Event') so that I can determine the number of days between events. Essentially, I want to be able to count the number of days that occur between events occuring Importantly, I only want to start the 'count' between events after the first event has occurred for each person. Here is a sample dataframe:

Day = c(1:8,1:8)
Event = c(0,0,1,NA,0,0,1,0,0,1,NA,NA,0,1,0,1)
Person = c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)
sample <- data.frame(Person,Day,Event);sample

I would like it to end up like this:

NewEvent = c(NA,NA,0,1,2,3,0,1,NA,0,1,2,3,0,1,0)
sample2 <- sample <- data.frame(Person,Day,NewEvent);sample2

I'm new to R, unfamiliar with loops or if statements, and I could not find a thread which already answered this type of issue, so any help would be greatly appreciated. Thank you!

Upvotes: 2

Views: 205

Answers (1)

MKR
MKR

Reputation: 20095

One approach is to group on Person and calculate distinct occurrence of events by cumsum(Event == 1). Now, group on both Person and grp to count days passed from occurrence of distinct event. The solution will be as :

library(dplyr)

sample %>% group_by(Person) %>%
  mutate(EventNum = cumsum(!is.na(Event) & Event == 1)) %>%
  group_by(Person, EventNum) %>%
  mutate(NewEvent = ifelse(EventNum ==0, NA, row_number() - 1)) %>%
  ungroup() %>%
  select(Person, Day, NewEvent) %>%
  as.data.frame()


#    Person Day NewEvent
# 1       1   1       NA
# 2       1   2       NA
# 3       1   3        0
# 4       1   4        1
# 5       1   5        2
# 6       1   6        3
# 7       1   7        0
# 8       1   8        1
# 9       2   1       NA
# 10      2   2        0
# 11      2   3        1
# 12      2   4        2
# 13      2   5        3
# 14      2   6        0
# 15      2   7        1
# 16      2   8        0

Note: If data is not sorted on Day then one should add arrange(Day) in above code.

Upvotes: 3

Related Questions