Reputation: 659
Follow up to this: Create a count-consecutive variable which resets to 1
and the solution worked great. Now I have below, where date is POSixct:
df<-data.frame(group=c(1, 1, 1, 1, 2, 2, 2),
date=c("2000-01-01 00:00:00", "2000-01-03 00:00:00", "2000-01-04 07:07:40", "2000-01-05 09:09:00", "2000-01-09 00:00:00", "2000-01-10 14:00:00", "2000-01-11 13:00:00"),
want=c(1,1,2,3,1,2,1),
want2=c(3,3,3,3,2,2,2))
library(anytime)
df<-df %>% mutate(date = anytime::anytime(str_c(date, sep= ' ')))
group date want want2
1 1 2000-01-01 00:00:00 1 3
2 1 2000-01-03 00:00:00 1 3
3 1 2000-01-04 07:07:40 2 3
4 1 2000-01-05 09:09:00 3 3
5 2 2000-01-09 00:00:00 1 2
6 2 2000-01-10 14:00:00 2 2
7 2 2000-01-11 13:00:00 1 2
I want to begin counting when the 'next day' is after 24 hrs but before 48 hrs.
Trying this without success, because I think the diff function gives me a result in seconds:
df %>%
group_by(group) %>%
group_by(group2 = cumsum(c(TRUE, diff(date)<86400&diff(date)>172800))), add = TRUE) %>%
mutate(wantn = row_number()) %>%
group_by(group) %>%
mutate(want2n = max(wantn)) %>%
select(-group2)
Upvotes: 0
Views: 81
Reputation: 42582
Here, difftime()
is a better choice than diff()
because the units can be specified.
If I understand correctly, a sequence of POSIXct
timestamps is considered consecutive if the time difference is 24 hours or more but less than 48 hours.
The code below reproduces the expected result for the sample dataset:
library(dplyr)
library(magrittr)
df %>%
group_by(group) %>%
mutate(want = difftime(date, lag(date, default = date[1L]), units = "days") %>%
floor() %>%
equals(1) %>%
not() %>%
cumsum() %>%
data.table::rowid(),
want2 = max(want))
# A tibble: 7 x 4 # Groups: group [2] group date want want2 <dbl> <dttm> <int> <int> 1 1 2000-01-01 00:00:00 1 3 2 1 2000-01-03 00:00:00 1 3 3 1 2000-01-04 07:07:40 2 3 4 1 2000-01-05 09:09:00 3 3 5 2 2000-01-09 00:00:00 1 2 6 2 2000-01-10 14:00:00 2 2 7 2 2000-01-11 13:00:00 1 2
df %>%
group_by(group) %>%
mutate(delta = difftime(date, lag(date, default = date[1L]), units = "days"))
returns
# A tibble: 7 x 5 # Groups: group [2] group date want want2 delta <dbl> <dttm> <dbl> <dbl> <drtn> 1 1 2000-01-01 00:00:00 1 3 0.0000000 days 2 1 2000-01-03 00:00:00 1 3 2.0000000 days 3 1 2000-01-04 07:07:40 2 3 1.2969907 days 4 1 2000-01-05 09:09:00 3 3 1.0842593 days 5 2 2000-01-09 00:00:00 1 2 0.0000000 days 6 2 2000-01-10 14:00:00 2 2 1.5833333 days 7 2 2000-01-11 13:00:00 1 2 0.9583333 days
By rounding down to the next lower integer (floor()
), the logic for the Date
case can be used.
library(magrittr)
df <- data.frame(
group = c(1, 1, 1, 1, 2, 2, 2),
date = c(
"2000-01-01 00:00:00",
"2000-01-03 00:00:00",
"2000-01-04 07:07:40",
"2000-01-05 09:09:00",
"2000-01-09 00:00:00",
"2000-01-10 14:00:00",
"2000-01-11 13:00:00"
) %>% lubridate::as_datetime(),
want = c(1, 1, 2, 3, 1, 2, 1),
want2 = c(3, 3, 3, 3, 2, 2, 2)
)
Upvotes: 1