Reputation: 25
I'm trying to restructure my data to recode a variable ('Event') so that I can determine the number of days between events. Essentially, I want to be able to count the number of days that occur between events occuring Importantly, I only want to start the 'count' between events after the first event has occurred for each person. Here is a sample dataframe:
Day = c(1:8,1:8)
Event = c(0,0,1,NA,0,0,1,0,0,1,NA,NA,0,1,0,1)
Person = c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)
sample <- data.frame(Person,Day,Event);sample
I would like it to end up like this:
NewEvent = c(NA,NA,0,1,2,3,0,1,NA,0,1,2,3,0,1,0)
sample2 <- sample <- data.frame(Person,Day,NewEvent);sample2
I'm new to R, unfamiliar with loops or if statements, and I could not find a thread which already answered this type of issue, so any help would be greatly appreciated. Thank you!
Upvotes: 2
Views: 205
Reputation: 20095
One approach is to group on Person
and calculate distinct occurrence of events by cumsum(Event == 1)
. Now, group on both Person
and grp
to count days passed from occurrence of distinct event. The solution will be as :
library(dplyr)
sample %>% group_by(Person) %>%
mutate(EventNum = cumsum(!is.na(Event) & Event == 1)) %>%
group_by(Person, EventNum) %>%
mutate(NewEvent = ifelse(EventNum ==0, NA, row_number() - 1)) %>%
ungroup() %>%
select(Person, Day, NewEvent) %>%
as.data.frame()
# Person Day NewEvent
# 1 1 1 NA
# 2 1 2 NA
# 3 1 3 0
# 4 1 4 1
# 5 1 5 2
# 6 1 6 3
# 7 1 7 0
# 8 1 8 1
# 9 2 1 NA
# 10 2 2 0
# 11 2 3 1
# 12 2 4 2
# 13 2 5 3
# 14 2 6 0
# 15 2 7 1
# 16 2 8 0
Note: If data is not sorted
on Day
then one should add arrange(Day)
in above code.
Upvotes: 3