Reputation: 161
I'm trying to do something very similar to what was done in the post below. Using dplyr. However, I would like to avoid the conversion from minutes to months, as the highest precision of the date variable in my current database is months..
Calculating time difference by ID
Incident.ID.. = c(rep("INCFI0000029582",4), rep("INCFI0000029587",4))
date = c("2014-09-11", "2014-10-11", "2014-11-11", "2014-12-11", "2015-01-11", "2015-02-11", "2015-03-11", "2015-04-11")
df = data.frame(Incident.ID..,date, stringsAsFactors = FALSE)
library(dplyr)
library(lubridate)
df %>%
group_by(Incident.ID..) %>%
mutate(diff = c(0, diff(ymd_hms(date))))
Source: local data frame [8 x 3]
Groups: Incident.ID..
`enter code here` Incident.ID.. date.diff
1 INCFI0000029582 2014-09-25 08:39:45 0
2 INCFI0000029582 2014-09-25 08:39:48 3
3 INCFI0000029582 2014-09-25 08:40:44 56
4 INCFI0000029582 2014-10-10 23:04:00 1347796
5 INCFI0000029587 2014-09-25 08:33:32 0
6 INCFI0000029587 2014-09-25 08:34:41 69
7 INCFI0000029587 2014-09-25 08:35:24 43
8 INCFI0000029587 2014-10-10 23:04:00 1348116
I tried converting the date variable to as.Date format and using the dplyr functions as below
library(dplyr)
library(lubridate)
df %>%
group_by(Incident.ID..) %>%
mutate(diff = c(0, diff(date)))
But I'm getting this result:
Incident.ID.. date diff
<chr> <date> <dbl>
1 INCFI0000029582 2014-09-11 0
2 INCFI0000029582 2014-10-11 30
3 INCFI0000029582 2014-11-11 31
4 INCFI0000029582 2014-12-11 30
5 INCFI0000029587 2015-01-11 31
6 INCFI0000029587 2015-02-11 31
7 INCFI0000029587 2015-03-11 28
8 INCFI0000029587 2015-04-11 31
I would like to get to the result shown below (date difference calculated using the first encounter per ID):
Incident.ID.. date diff
<chr> <date> <dbl>
1 INCFI0000029582 2014-09-11 0
2 INCFI0000029582 2014-10-11 30
3 INCFI0000029582 2014-11-11 61
4 INCFI0000029582 2014-12-11 91
5 INCFI0000029587 2015-01-11 0
6 INCFI0000029587 2015-02-11 31
7 INCFI0000029587 2015-03-11 59
8 INCFI0000029587 2015-04-11 90
Any idea how to get this done? Thank you for your help.
Albit
Upvotes: 1
Views: 304
Reputation: 26
Solution could be to get minimum date per ID in a new column and then subtract it from each date.
Min_date<-df%>%group_by(Incident.ID..)%>%summarise(min_date=min(date))
df<-df%>%left_join(Min_date,by="Incident.ID..")
Upvotes: 1