biostatguy12
biostatguy12

Reputation: 659

Create a count consecutive variable which resets to 1 based on POSIXct date

Follow up to this: Create a count-consecutive variable which resets to 1

and the solution worked great. Now I have below, where date is POSixct:

df<-data.frame(group=c(1, 1, 1, 1, 2, 2, 2), 
               date=c("2000-01-01 00:00:00", "2000-01-03 00:00:00", "2000-01-04 07:07:40", "2000-01-05 09:09:00", "2000-01-09 00:00:00", "2000-01-10 14:00:00", "2000-01-11 13:00:00"),
               want=c(1,1,2,3,1,2,1),
               want2=c(3,3,3,3,2,2,2))
library(anytime)
df<-df %>% mutate(date = anytime::anytime(str_c(date, sep= ' ')))
  group                date want want2
1     1 2000-01-01 00:00:00    1     3
2     1 2000-01-03 00:00:00    1     3
3     1 2000-01-04 07:07:40    2     3
4     1 2000-01-05 09:09:00    3     3
5     2 2000-01-09 00:00:00    1     2
6     2 2000-01-10 14:00:00    2     2
7     2 2000-01-11 13:00:00    1     2

I want to begin counting when the 'next day' is after 24 hrs but before 48 hrs.

Trying this without success, because I think the diff function gives me a result in seconds:

df %>%
    group_by(group) %>%
    group_by(group2 = cumsum(c(TRUE, diff(date)<86400&diff(date)>172800))), add = TRUE) %>%
    mutate(wantn = row_number()) %>%
    group_by(group) %>%
    mutate(want2n = max(wantn)) %>%       
    select(-group2)

Upvotes: 0

Views: 81

Answers (1)

Uwe
Uwe

Reputation: 42582

Here, difftime() is a better choice than diff() because the units can be specified.

If I understand correctly, a sequence of POSIXct timestamps is considered consecutive if the time difference is 24 hours or more but less than 48 hours.

The code below reproduces the expected result for the sample dataset:

library(dplyr)
library(magrittr)
df %>% 
  group_by(group) %>% 
  mutate(want = difftime(date, lag(date, default = date[1L]), units = "days") %>% 
           floor() %>% 
           equals(1) %>% 
           not() %>% 
           cumsum() %>% 
           data.table::rowid(),
         want2 = max(want))
# A tibble: 7 x 4
# Groups:   group [2]
  group date                 want want2
  <dbl> <dttm>              <int> <int>
1     1 2000-01-01 00:00:00     1     3
2     1 2000-01-03 00:00:00     1     3
3     1 2000-01-04 07:07:40     2     3
4     1 2000-01-05 09:09:00     3     3
5     2 2000-01-09 00:00:00     1     2
6     2 2000-01-10 14:00:00     2     2
7     2 2000-01-11 13:00:00     1     2

Explanation

df %>% 
  group_by(group) %>% 
  mutate(delta = difftime(date, lag(date, default = date[1L]), units = "days"))

returns

# A tibble: 7 x 5
# Groups:   group [2]
  group date                 want want2 delta         
  <dbl> <dttm>              <dbl> <dbl> <drtn>        
1     1 2000-01-01 00:00:00     1     3 0.0000000 days
2     1 2000-01-03 00:00:00     1     3 2.0000000 days
3     1 2000-01-04 07:07:40     2     3 1.2969907 days
4     1 2000-01-05 09:09:00     3     3 1.0842593 days
5     2 2000-01-09 00:00:00     1     2 0.0000000 days
6     2 2000-01-10 14:00:00     2     2 1.5833333 days
7     2 2000-01-11 13:00:00     1     2 0.9583333 days

By rounding down to the next lower integer (floor()), the logic for the Date case can be used.

Data

library(magrittr)
df <- data.frame(
  group = c(1, 1, 1, 1, 2, 2, 2),
  date = c(
    "2000-01-01 00:00:00",
    "2000-01-03 00:00:00",
    "2000-01-04 07:07:40",
    "2000-01-05 09:09:00",
    "2000-01-09 00:00:00",
    "2000-01-10 14:00:00",
    "2000-01-11 13:00:00"
  ) %>% lubridate::as_datetime(),
  want = c(1, 1, 2, 3, 1, 2, 1),
  want2 = c(3, 3, 3, 3, 2, 2, 2)
)

Upvotes: 1

Related Questions