Reputation: 119
Similar questions have been asked before where the question is how to calculate the number of observations since an event. I have a further request. How to calculate the number of days since the same type of observation but also to find the number of days since any other type of event. I also have ids.
To illustrate please see below. I am trying to do this in R using Datatables but to little result.
What I have:
date event id
2000-07-06 2 1
2000-07-07 1 1
2000-07-09 0 1
2000-07-10 0 1
2000-07-15 2 1
2000-07-16 1 1
2000-07-20 0 1
2000-07-21 1 1
2000-07-06 1 2
2000-07-07 2 2
2000-07-15 0 2
2000-07-16 0 2
2000-07-17 2 2
2000-07-18 1 2
and what I would like to have is as follows:
date event id days_since_event_1 days_since_event_2
2000-07-06 2 1 NA NA
2000-07-07 1 1 NA 1
2000-07-09 0 1 2 3
2000-07-10 0 1 3 4
2000-07-15 2 1 8 9
2000-07-16 1 1 9 1
2000-07-20 0 1 4 5
2000-07-21 1 1 5 6
2000-07-06 1 2 NA NA
2000-07-07 2 2 1 NA
2000-07-15 0 2 9 8
2000-07-16 0 2 10 9
2000-07-17 2 2 11 10
2000-07-18 1 2 12 1
The two events are mutually exclusive, that is, they cannot take place on the same day.
Upvotes: 2
Views: 203
Reputation: 4358
The following uses the Chron
Library to calculate difference in the dates
library(chron)
df$date <- chron(as.character(df$date),format=c(date="y-m-d"))
for(j in unique(df$id)){
DaysSince1 <-NA
DaysSince2 <-NA
RowsWithID <- grep(j,df$id)
for(i in RowsWithID){
df$days_since_event_1[i] <- df$date[i]-df$date[i-DaysSince1]
df$days_since_event_2[i] <- df$date[i]-df$date[i-DaysSince2]
if(df$event[i]==1){DaysSince1<-1}
else{DaysSince1<-DaysSince1+1}
if(df$event[i]==2){DaysSince2<-1}
else{DaysSince2<-DaysSince2+1}
}
}
This code gives the following results
> df
date event id days_since_event_1 days_since_event_2
1 00-07-06 2 1 NA NA
2 00-07-07 1 1 NA 1
3 00-07-09 0 1 2 3
4 00-07-10 0 1 3 4
5 00-07-15 2 1 8 9
6 00-07-16 1 1 9 1
7 00-07-20 0 1 4 5
8 00-07-21 1 1 5 6
9 00-07-06 1 2 NA NA
10 00-07-07 2 2 1 NA
11 00-07-15 0 2 9 8
12 00-07-16 0 2 10 9
13 00-07-17 2 2 11 10
14 00-07-18 1 2 12 1
To address you comment, you can do the following in Base R
to get the number of observations rather than days. No Libraries needed.
for(j in unique(df$id)){
ObsSince1 <-NA
ObsSince2 <-NA
RowsWithID <- grep(j,df$id)
for(i in RowsWithID){
df$Obs_since_event_1[i] <- ObsSince1
df$Obs_since_event_2[i] <- ObsSince2
if(df$event[i]==1){ObsSince1<-1}
else{ObsSince1<-ObsSince1+1}
if(df$event[i]==2){ObsSince2<-1}
else{ObsSince2<-ObsSince2+1}
}
}
You should get the following output
> df
date event id Obs_since_event_1 Obs_since_event_2
1 2000-07-06 2 1 NA NA
2 2000-07-07 1 1 NA 1
3 2000-07-09 0 1 1 2
4 2000-07-10 0 1 2 3
5 2000-07-15 2 1 3 4
6 2000-07-16 1 1 4 1
7 2000-07-20 0 1 1 2
8 2000-07-21 1 1 2 3
9 2000-07-06 1 2 NA NA
10 2000-07-07 2 2 1 NA
11 2000-07-15 0 2 2 1
12 2000-07-16 0 2 3 2
13 2000-07-17 2 2 4 3
14 2000-07-18 1 2 5 1
Upvotes: 2
Reputation: 400
You could subset your Dates for all with a specific event encoding, e.g.:
date.2 = DATAFRAME[which(DATAFRAME[,2]==2),1]
and then just do
DATAFRAME[which(DATAFRAME[,2]==2),5] = as.numeric(diff.Date(date.2))
and so on. Possibly this is even easier to do, but this was the first thing coming to my mind. DATAFRAME is just the name of your dataframe here.
edit: If I see it correctly you want NAs wherever ID and event column are different to each other? Then you could just go on with: DATAFRAME[which(DATAFRAME[,2] != DATAFRAME[,3]),c(4,5)] = NA or something like that
Upvotes: 2