albit paoli
albit paoli

Reputation: 161

Date difference per ID with dplyr without hours, minutes or seconds

I'm trying to do something very similar to what was done in the post below. Using dplyr. However, I would like to avoid the conversion from minutes to months, as the highest precision of the date variable in my current database is months..

Calculating time difference by ID

   Incident.ID.. = c(rep("INCFI0000029582",4), rep("INCFI0000029587",4))
   date = c("2014-09-11", "2014-10-11", "2014-11-11", "2014-12-11", "2015-01-11", "2015-02-11", "2015-03-11", "2015-04-11")
   df = data.frame(Incident.ID..,date, stringsAsFactors = FALSE)

library(dplyr)
library(lubridate)
df %>%
    group_by(Incident.ID..) %>%
    mutate(diff = c(0, diff(ymd_hms(date))))

Source: local data frame [8 x 3]
Groups: Incident.ID..

 `enter code here`   Incident.ID..        date.diff
1 INCFI0000029582 2014-09-25 08:39:45       0
2 INCFI0000029582 2014-09-25 08:39:48       3
3 INCFI0000029582 2014-09-25 08:40:44      56
4 INCFI0000029582 2014-10-10 23:04:00 1347796
5 INCFI0000029587 2014-09-25 08:33:32       0
6 INCFI0000029587 2014-09-25 08:34:41      69
7 INCFI0000029587 2014-09-25 08:35:24      43
8 INCFI0000029587 2014-10-10 23:04:00 1348116

I tried converting the date variable to as.Date format and using the dplyr functions as below

library(dplyr)
library(lubridate)
df %>%
  group_by(Incident.ID..) %>%
  mutate(diff = c(0, diff(date)))

But I'm getting this result:

    Incident.ID..       date  diff
            <chr>     <date> <dbl>
1 INCFI0000029582 2014-09-11     0
2 INCFI0000029582 2014-10-11    30
3 INCFI0000029582 2014-11-11    31
4 INCFI0000029582 2014-12-11    30
5 INCFI0000029587 2015-01-11    31
6 INCFI0000029587 2015-02-11    31
7 INCFI0000029587 2015-03-11    28
8 INCFI0000029587 2015-04-11    31

I would like to get to the result shown below (date difference calculated using the first encounter per ID):

    Incident.ID..       date  diff
            <chr>     <date> <dbl>
1 INCFI0000029582 2014-09-11     0
2 INCFI0000029582 2014-10-11    30
3 INCFI0000029582 2014-11-11    61
4 INCFI0000029582 2014-12-11    91
5 INCFI0000029587 2015-01-11    0
6 INCFI0000029587 2015-02-11    31
7 INCFI0000029587 2015-03-11    59
8 INCFI0000029587 2015-04-11    90

Any idea how to get this done? Thank you for your help.

Albit

Upvotes: 1

Views: 304

Answers (1)

Ivan Lozo
Ivan Lozo

Reputation: 26

Solution could be to get minimum date per ID in a new column and then subtract it from each date.

Min_date<-df%>%group_by(Incident.ID..)%>%summarise(min_date=min(date))

df<-df%>%left_join(Min_date,by="Incident.ID..")

Upvotes: 1

Related Questions