Using R to create vector containing continuous counts based on date time

Question

I have a data frame in R that looks as follows...

Year  Month  Day  Hour DateTime 
1950  1      4    12   1/4/1950 12:00
1950  1      4    13   1/4/1950 13:00
1950  1      4    14   1/4/1950 14:00
1950  1      4    15   1/4/1950 15:00
1950  1      4    18   1/4/1950 18:00
1950  1      4    21   1/4/1950 21:00
1950  1      4    22   1/4/1950 22:00
1950  1      5    23   1/5/1950 23:00

The data goes from year (1950-2017) with months (1-12) day (1-31) hour (0-23) and I want to create a vector that sums the observations that are continuous, that is the observations that are right after each other. For example, the first few numbers in the created column vector should be (4,1,3), because the first four observations are continuous, the next is not, then the next three are continuous.

Calum You · Accepted Answer

We can use tidyverse tools and cumsum to create group indices for consecutive runs, and then get the number of rows in each group. Note that I do not reproduce your expected output because the last row is not consecutive from the second-to-last row in your data.

library(tidyverse)
library(lubridate)

tbl <- read_table2(
"Year  Month  Day  Hour
1950  1      4    12
1950  1      4    13
1950  1      4    14
1950  1      4    15
1950  1      4    18
1950  1      4    21
1950  1      4    22
1950  1      5    23"
)
tbl %>%
  mutate(
    dt = str_c(Year, Month, Day, Hour, sep = "-") %>% ymd_h(),
    consec_grp = cumsum(dt - lag(dt, default = 0) != hours(1))
  ) %>%
  group_by(consec_grp) %>%
  mutate(consec_num = n())
#> # A tibble: 8 x 7
#> # Groups:   consec_grp [4]
#>    Year Month   Day  Hour dt                  consec_grp consec_num
#>                                
#> 1  1950     1     4    12 1950-01-04 12:00:00          1          4
#> 2  1950     1     4    13 1950-01-04 13:00:00          1          4
#> 3  1950     1     4    14 1950-01-04 14:00:00          1          4
#> 4  1950     1     4    15 1950-01-04 15:00:00          1          4
#> 5  1950     1     4    18 1950-01-04 18:00:00          2          1
#> 6  1950     1     4    21 1950-01-04 21:00:00          3          2
#> 7  1950     1     4    22 1950-01-04 22:00:00          3          2
#> 8  1950     1     5    23 1950-01-05 23:00:00          4          1

Created on 2018-10-02 by the reprex package (v0.2.0).

Using R to create vector containing continuous counts based on date time

Answers (2)

data

Related Questions