James
James

Reputation: 526

Rolling Window based on a fixed time interval

I'm trying to calculate a rolling window in a fixed time interval. Suppose that the interval is 48 hours. I would like to get every data point that is contained between the date of the current observation and 48 hours before that observation. For example, if the datetime of the current observation is 05-07-2022 14:15:28, for that position, I would like a count value for every occurence between that date and 03-07-2022 14:15:28. Seconds are not fundamental to the analysis.

library(tidyverse)
library(lubridate)

df = tibble(id = 1:7,
            date_time = ymd_hm('2022-05-07 15:00', '2022-05-09 13:45', '2022-05-09 13:51', '2022-05-09 17:00',
                               '2022-05-10 15:25', '2022-05-10 17:18', '2022-05-11 14:00'))


# A tibble: 7 × 2
     id date_time          
  <int> <dttm>             
1     1 2022-05-07 15:00:00
2     2 2022-05-09 13:45:00
3     3 2022-05-09 13:51:00
4     4 2022-05-09 17:00:00
5     5 2022-05-10 15:25:00
6     6 2022-05-10 17:18:00
7     7 2022-05-11 14:00:00

With the example window of 48 hours, that would yield:

# A tibble: 7 × 4
     id date_time           lag_48hours         count
  <int> <dttm>              <dttm>              <dbl>
1     1 2022-05-07 15:00:00 2022-05-05 15:00:00     1
2     2 2022-05-09 13:45:00 2022-05-07 13:45:00     2
3     3 2022-05-09 13:51:00 2022-05-07 13:51:00     3
4     4 2022-05-09 17:00:00 2022-05-07 17:00:00     3
5     5 2022-05-10 15:25:00 2022-05-08 15:25:00     4
6     6 2022-05-10 17:18:00 2022-05-08 17:18:00     5
7     7 2022-05-11 14:00:00 2022-05-09 14:00:00     4

I added the lag column for illustration purposes. Any idea how to obtain the count column? I need to be able to adjust the window (48 hours in this example).

Upvotes: 0

Views: 134

Answers (2)

Davis Vaughan
Davis Vaughan

Reputation: 2950

I'd encourage you to use slider, which allows you to do rolling window analysis using an irregular index.

library(tidyverse)
library(lubridate)
library(slider)

df = tibble(
  id = 1:7,
  date_time = ymd_hm(
    '2022-05-07 15:00', '2022-05-09 13:45', '2022-05-09 13:51', '2022-05-09 17:00',
    '2022-05-10 15:25', '2022-05-10 17:18', '2022-05-11 14:00'
  )
)

df %>%
  mutate(
    count = slide_index_int(
      .x = id, 
      .i = date_time, 
      .f = length, 
      .before = dhours(48)
    )
  )
#> # A tibble: 7 × 3
#>      id date_time           count
#>   <int> <dttm>              <int>
#> 1     1 2022-05-07 15:00:00     1
#> 2     2 2022-05-09 13:45:00     2
#> 3     3 2022-05-09 13:51:00     3
#> 4     4 2022-05-09 17:00:00     3
#> 5     5 2022-05-10 15:25:00     4
#> 6     6 2022-05-10 17:18:00     5
#> 7     7 2022-05-11 14:00:00     4

Upvotes: 2

Andrew Gustar
Andrew Gustar

Reputation: 18425

How about this...

df %>% 
  mutate(count48 = map_int(date_time, 
                           ~sum(date_time <= . & date_time > . - 48 * 60 * 60)))

# A tibble: 7 × 3
     id date_time           count48
  <int> <dttm>                <int>
1     1 2022-05-07 15:00:00       1
2     2 2022-05-09 13:45:00       2
3     3 2022-05-09 13:51:00       3
4     4 2022-05-09 17:00:00       3
5     5 2022-05-10 15:25:00       4
6     6 2022-05-10 17:18:00       5
7     7 2022-05-11 14:00:00       4

Upvotes: 1

Related Questions