Vinny Brown
Vinny Brown

Reputation: 29

Using R to create vector containing continuous counts based on date time

I have a data frame in R that looks as follows...

Year  Month  Day  Hour DateTime 
1950  1      4    12   1/4/1950 12:00
1950  1      4    13   1/4/1950 13:00
1950  1      4    14   1/4/1950 14:00
1950  1      4    15   1/4/1950 15:00
1950  1      4    18   1/4/1950 18:00
1950  1      4    21   1/4/1950 21:00
1950  1      4    22   1/4/1950 22:00
1950  1      5    23   1/5/1950 23:00

The data goes from year (1950-2017) with months (1-12) day (1-31) hour (0-23) and I want to create a vector that sums the observations that are continuous, that is the observations that are right after each other. For example, the first few numbers in the created column vector should be (4,1,3), because the first four observations are continuous, the next is not, then the next three are continuous.

Upvotes: 0

Views: 252

Answers (2)

akrun
akrun

Reputation: 887501

We check the difference between the adjacent dates with difftime, check if the difference is not equal to 1, use rle to get the count

dt <- as.POSIXct(df1$DateTime, format = "%m/%d/%Y %H:%M")
rle(cumsum(c(TRUE, difftime(dt[-1], dt[-length(dt)], unit = "hour") != 1)))$lengths
#[1] 4 1 2 1

Or as @Ryan mentioned, diff can be used as well

rle(cumsum(c(TRUE, diff(dt, unit = "hour") != 1)))$lengths

data

df1 <- structure(list(Year = c(1950L, 1950L, 1950L, 1950L, 1950L, 1950L, 
1950L, 1950L), Month = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Day = c(4L, 
4L, 4L, 4L, 4L, 4L, 4L, 5L), Hour = c(12L, 13L, 14L, 15L, 18L, 
21L, 22L, 23L), DateTime = c("1/4/1950 12:00", "1/4/1950 13:00", 
"1/4/1950 14:00", "1/4/1950 15:00", "1/4/1950 18:00", "1/4/1950 21:00", 
"1/4/1950 22:00", "1/5/1950 23:00")), class = "data.frame", row.names = c(NA, 
 -8L))

Upvotes: 4

Calum You
Calum You

Reputation: 15072

We can use tidyverse tools and cumsum to create group indices for consecutive runs, and then get the number of rows in each group. Note that I do not reproduce your expected output because the last row is not consecutive from the second-to-last row in your data.

library(tidyverse)
library(lubridate)

tbl <- read_table2(
"Year  Month  Day  Hour
1950  1      4    12
1950  1      4    13
1950  1      4    14
1950  1      4    15
1950  1      4    18
1950  1      4    21
1950  1      4    22
1950  1      5    23"
)
tbl %>%
  mutate(
    dt = str_c(Year, Month, Day, Hour, sep = "-") %>% ymd_h(),
    consec_grp = cumsum(dt - lag(dt, default = 0) != hours(1))
  ) %>%
  group_by(consec_grp) %>%
  mutate(consec_num = n())
#> # A tibble: 8 x 7
#> # Groups:   consec_grp [4]
#>    Year Month   Day  Hour dt                  consec_grp consec_num
#>   <int> <int> <int> <int> <dttm>                   <int>      <int>
#> 1  1950     1     4    12 1950-01-04 12:00:00          1          4
#> 2  1950     1     4    13 1950-01-04 13:00:00          1          4
#> 3  1950     1     4    14 1950-01-04 14:00:00          1          4
#> 4  1950     1     4    15 1950-01-04 15:00:00          1          4
#> 5  1950     1     4    18 1950-01-04 18:00:00          2          1
#> 6  1950     1     4    21 1950-01-04 21:00:00          3          2
#> 7  1950     1     4    22 1950-01-04 22:00:00          3          2
#> 8  1950     1     5    23 1950-01-05 23:00:00          4          1

Created on 2018-10-02 by the reprex package (v0.2.0).

Upvotes: 3

Related Questions