M. Beausoleil
M. Beausoleil

Reputation: 3555

Calculate mean time (H:M) difference from time intervals per day within each group in R

I have a dataset that looks like this:

  id  gr       dt.DMYMS        dt.DMYMS.pos
1  A R21 01/03/19 05:39 2019-03-01 05:39:00
2  A R21 01/03/19 05:42 2019-03-01 05:42:00
3  A F23 01/03/19 06:23 2019-03-01 06:23:00
4  A F23 01/03/19 06:26 2019-03-01 06:26:00
5  A F23 01/03/19 06:28 2019-03-01 06:28:00
6  A F24 01/03/19 07:08 2019-03-01 07:08:00
[...]

Here is the code to get it:

dat = structure(list(id = c("A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B"), gr = c("R21", "R21", "F23", "F23", 
"F23", "F24", "F24", "F25", "F25", "F26", "F26", "F27", "F27", 
"R29", "R29", "R31", "R31", "N34", "N34", "F35", "F35", "F36", 
"F36", "F37", "F37", "F39", "F39", "R40", "R40", "R17", "R17", 
"F18", "F18", "F19", "F19", "F20", "F20", "R21", "R21", "F22", 
"F22", "F23", "F23", "F25", "F25", "Gr1", "Gr1", "F28", "F28", 
"F30", "F30", "F32", "F32", "R33", "R33", "F37", "F37", "F39", 
"F39", "R40", "R40", "F41", "F41", "F42", "F42", "R43", "R43", 
"F1", "F1", "R15", "R15", "R15", "F17", "F17", "F19", "F19", 
"F20", "F20"), dt.DMYMS = c("01/03/19 05:39", "01/03/19 05:42", 
"01/03/19 06:23", "01/03/19 06:26", "01/03/19 06:28", "01/03/19 07:08", 
"01/03/19 07:10", "03/03/19 06:17", "03/03/19 06:20", "03/03/19 06:58", 
"03/03/19 07:00", "03/03/19 07:49", "03/03/19 07:50", "04/03/19 05:28", 
"04/03/19 05:31", "05/03/19 05:33", "05/03/19 05:36", "08/03/19 06:21", 
"08/03/19 06:24", "08/03/19 09:24", "08/03/19 09:28", "08/03/19 17:33", 
"08/03/19 17:39", "11/03/19 06:24", "11/03/19 06:26", "12/03/19 06:21", 
"12/03/19 06:24", "13/03/19 05:38", "13/03/19 05:40", "28/02/19 05:42", 
"28/02/19 05:46", "28/02/19 06:27", "28/02/19 06:37", "28/02/19 06:59", 
"28/02/19 07:04", "28/02/19 09:30", "28/02/19 09:32", "01/03/19 05:43", 
"01/03/19 05:47", "01/03/19 06:10", "01/03/19 06:11", "01/03/19 06:34", 
"01/03/19 06:35", "01/03/19 07:17", "01/03/19 07:17", "01/03/19 08:24", 
"01/03/19 08:26", "01/03/19 09:30", "01/03/19 09:37", "03/03/19 06:41", 
"03/03/19 06:41", "03/03/19 09:37", "03/03/19 09:37", "04/03/19 05:26", 
"04/03/19 05:54", "05/03/19 08:55", "05/03/19 09:12", "07/03/19 06:07", 
"07/03/19 06:27", "08/03/19 05:39", "08/03/19 05:55", "08/03/19 09:37", 
"08/03/19 09:51", "12/03/19 06:31", "12/03/19 06:52", "13/03/19 05:43", 
"13/03/19 05:52", "26/02/19 10:59", "26/02/19 11:57", "28/02/19 05:39", 
"28/02/19 05:49", "28/02/19 05:57", "28/02/19 06:55", "28/02/19 06:59", 
"28/02/19 07:30", "28/02/19 07:31", "28/02/19 08:50", "28/02/19 08:51"
), dt.DMYMS.pos = structure(c(1551418740, 1551418920, 1551421380, 
1551421560, 1551421680, 1551424080, 1551424200, 1551593820, 1551594000, 
1551596280, 1551596400, 1551599340, 1551599400, 1551677280, 1551677460, 
1551763980, 1551764160, 1552026060, 1552026240, 1552037040, 1552037280, 
1552066380, 1552066740, 1552285440, 1552285560, 1552371660, 1552371840, 
1552455480, 1552455600, 1551332520, 1551332760, 1551335220, 1551335820, 
1551337140, 1551337440, 1551346200, 1551346320, 1551418980, 1551419220, 
1551420600, 1551420660, 1551422040, 1551422100, 1551424620, 1551424620, 
1551428640, 1551428760, 1551432600, 1551433020, 1551595260, 1551595260, 
1551605820, 1551605820, 1551677160, 1551678840, 1551776100, 1551777120, 
1551938820, 1551940020, 1552023540, 1552024500, 1552037820, 1552038660, 
1552372260, 1552373520, 1552455780, 1552456320, 1551178740, 1551182220, 
1551332340, 1551332940, 1551333420, 1551336900, 1551337140, 1551339000, 
1551339060, 1551343800, 1551343860), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), row.names = c(NA, -78L), class = "data.frame")

I want to calculate the group by gr and calculate the average between the 2 identical groups in gr(there are 2 points per gr).

This can be done with this code:

dat %>%
  dplyr::group_by(id, gr) %>% 
  dplyr::summarise(mean.pergroup = mean(dt.DMYMS.pos)) %>% 
  dplyr::arrange(id, gr, mean.pergroup)

Which gives:

`summarise()` regrouping output by 'id' (override with `.groups` argument)
# A tibble: 38 x 3
# Groups:   id [2]
   id    gr    mean.pergroup      
   <chr> <chr> <dttm>             
 1 A     F18   2019-02-28 06:32:00
 2 A     F19   2019-02-28 07:01:30
 3 A     F20   2019-02-28 09:31:00
 4 A     F23   2019-03-01 06:25:40
 5 A     F24   2019-03-01 07:09:00
 6 A     F25   2019-03-03 06:18:30
[...]

Then I want to "group" by day and find the consecutive difference in time and calculate the average difference per day. i.e. looking at line 1, 2, and 3 above, it would be mean(07:01:30 -06:32:00 + 09:31:00 - 07:01:30) = mean(29.5 min+ 149.5min) = 89.5 min For id A on day 2019-02-28.

I'm able to get difference in time, but not for all the lines:

test = dat %>%
  dplyr::group_by(id, gr) %>% 
  dplyr::summarise(mean.pergroup = mean(dt.DMYMS.pos)) %>% 
  dplyr::arrange(id, gr, mean.pergroup)

test[1:3,"mean.pergroup"][3,] - test[1:3,"mean.pergroup"][2,]

Gives

   mean.pergroup
1 2.491667 hours

For the difference between line 3 and 2...

I can only think of a for loop, but I'm pretty sure there is a way to do that in dplyr.

Upvotes: 0

Views: 38

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389012

In your dataset there are multiple rows with date 2019-02-28 in it but if you want to consider only consecutive values you can use rleid in data.table.

dat %>%
  dplyr::group_by(id, gr) %>% 
  dplyr::summarise(mean.pergroup = mean(dt.DMYMS.pos)) %>% 
  dplyr::ungroup() %>%
  dplyr::arrange(id, gr, mean.pergroup) %>%
  dplyr::mutate(date = as.Date(mean.pergroup)) %>%
  dplyr::group_by(grp = data.table::rleid(date)) %>%
  dplyr::summarise(date = first(date),
                   difference = mean(difftime(mean.pergroup,lag(mean.pergroup), 
                                     units = 'mins'), na.rm = TRUE))

Upvotes: 1

Related Questions