Jhonathan
Jhonathan

Reputation: 375

How to group data by time difference in r?

I want to group my dataset based on a posxict column. I need to group where the difference in time between the observations is no more than 3 hours compared to the previous observation. So this is a sample of my data:

time                type      day month gas           response
   <dttm>              <chr>   <int> <dbl> <chr>            <dbl>
 1 2018-08-21 14:30:00 old_std    21     8 benzene_area 10260721 
 2 2018-08-21 15:30:00 old_std    21     8 benzene_area  2591905 
 3 2018-11-09 20:00:00 old_std     9    11 benzene_area   684166
 4 2018-11-09 23:00:00 old_std     9    11 benzene_area   589498
 5 2018-11-10 02:00:00 old_std    10    11 benzene_area   460714
 6 2018-11-10 05:00:00 old_std    10    11 benzene_area   490663
 7 2018-11-10 11:30:00 old_std    10    11 benzene_area   404947
 8 2018-11-10 14:30:00 old_std    10    11 benzene_area   402566
 9 2018-11-10 16:30:00 old_std    10    11 benzene_area   362376
10 2018-11-11 00:00:00 old_std    11    11 benzene_area   276184

And this is what I am looking for:

time                type      day month gas           response   group
   <dttm>              <chr>   <int> <dbl> <chr>            <dbl>
 1 2018-08-21 14:30:00 old_std    21     8 benzene_area 10260721   1
 2 2018-08-21 15:30:00 old_std    21     8 benzene_area  2591905   1
 3 2018-11-09 20:00:00 old_std     9    11 benzene_area   684166   2
 4 2018-11-09 23:00:00 old_std     9    11 benzene_area   589498   2
 5 2018-11-10 02:00:00 old_std    10    11 benzene_area   460714   2
 6 2018-11-10 05:00:00 old_std    10    11 benzene_area   490663   2
 7 2018-11-10 11:30:00 old_std    10    11 benzene_area   404947   3
 8 2018-11-10 14:30:00 old_std    10    11 benzene_area   402566   3
 9 2018-11-10 16:30:00 old_std    10    11 benzene_area   362376   3
10 2018-11-11 00:00:00 old_std    11    11 benzene_area   276184   4

I hadn't found a way to group the data like this. Any ideas?

Upvotes: 0

Views: 177

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389235

You can use lag to get previous value and use it in difftime to get difference in hours and increment the group value everytime the difference is greater than 3 hours.

library(dplyr)
df %>%
        mutate(group = cumsum(difftime(time, lag(time, default = first(time)), 
                              units = "hours") > 3) + 1)

#                  time    type day month          gas response group
#1  2018-08-21 14:30:00 old_std  21     8 benzene_area 10260721     1
#2  2018-08-21 15:30:00 old_std  21     8 benzene_area  2591905     1
#3  2018-11-09 20:00:00 old_std   9    11 benzene_area   684166     2
#4  2018-11-09 23:00:00 old_std   9    11 benzene_area   589498     2
#5  2018-11-10 02:00:00 old_std  10    11 benzene_area   460714     2
#6  2018-11-10 05:00:00 old_std  10    11 benzene_area   490663     2
#7  2018-11-10 11:30:00 old_std  10    11 benzene_area   404947     3
#8  2018-11-10 14:30:00 old_std  10    11 benzene_area   402566     3
#9  2018-11-10 16:30:00 old_std  10    11 benzene_area   362376     3
#10 2018-11-11 00:00:00 old_std  11    11 benzene_area   276184     4

In base R, that would be :

df$group <- cumsum(c(TRUE, difftime(df$time[-1], df$time[-nrow(df)],
                    units = "hours") > 3))

Upvotes: 3

Related Questions