Reputation: 29
I have a data frame in R that looks as follows...
Year Month Day Hour DateTime
1950 1 4 12 1/4/1950 12:00
1950 1 4 13 1/4/1950 13:00
1950 1 4 14 1/4/1950 14:00
1950 1 4 15 1/4/1950 15:00
1950 1 4 18 1/4/1950 18:00
1950 1 4 21 1/4/1950 21:00
1950 1 4 22 1/4/1950 22:00
1950 1 5 23 1/5/1950 23:00
The data goes from year (1950-2017) with months (1-12) day (1-31) hour (0-23) and I want to create a vector that sums the observations that are continuous, that is the observations that are right after each other. For example, the first few numbers in the created column vector should be (4,1,3), because the first four observations are continuous, the next is not, then the next three are continuous.
Upvotes: 0
Views: 252
Reputation: 887501
We check the difference between the adjacent dates with difftime
, check if the difference is not equal to 1, use rle
to get the count
dt <- as.POSIXct(df1$DateTime, format = "%m/%d/%Y %H:%M")
rle(cumsum(c(TRUE, difftime(dt[-1], dt[-length(dt)], unit = "hour") != 1)))$lengths
#[1] 4 1 2 1
Or as @Ryan mentioned, diff
can be used as well
rle(cumsum(c(TRUE, diff(dt, unit = "hour") != 1)))$lengths
df1 <- structure(list(Year = c(1950L, 1950L, 1950L, 1950L, 1950L, 1950L,
1950L, 1950L), Month = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Day = c(4L,
4L, 4L, 4L, 4L, 4L, 4L, 5L), Hour = c(12L, 13L, 14L, 15L, 18L,
21L, 22L, 23L), DateTime = c("1/4/1950 12:00", "1/4/1950 13:00",
"1/4/1950 14:00", "1/4/1950 15:00", "1/4/1950 18:00", "1/4/1950 21:00",
"1/4/1950 22:00", "1/5/1950 23:00")), class = "data.frame", row.names = c(NA,
-8L))
Upvotes: 4
Reputation: 15072
We can use tidyverse
tools and cumsum
to create group indices for consecutive runs, and then get the number of rows in each group. Note that I do not reproduce your expected output because the last row is not consecutive from the second-to-last row in your data.
library(tidyverse)
library(lubridate)
tbl <- read_table2(
"Year Month Day Hour
1950 1 4 12
1950 1 4 13
1950 1 4 14
1950 1 4 15
1950 1 4 18
1950 1 4 21
1950 1 4 22
1950 1 5 23"
)
tbl %>%
mutate(
dt = str_c(Year, Month, Day, Hour, sep = "-") %>% ymd_h(),
consec_grp = cumsum(dt - lag(dt, default = 0) != hours(1))
) %>%
group_by(consec_grp) %>%
mutate(consec_num = n())
#> # A tibble: 8 x 7
#> # Groups: consec_grp [4]
#> Year Month Day Hour dt consec_grp consec_num
#> <int> <int> <int> <int> <dttm> <int> <int>
#> 1 1950 1 4 12 1950-01-04 12:00:00 1 4
#> 2 1950 1 4 13 1950-01-04 13:00:00 1 4
#> 3 1950 1 4 14 1950-01-04 14:00:00 1 4
#> 4 1950 1 4 15 1950-01-04 15:00:00 1 4
#> 5 1950 1 4 18 1950-01-04 18:00:00 2 1
#> 6 1950 1 4 21 1950-01-04 21:00:00 3 2
#> 7 1950 1 4 22 1950-01-04 22:00:00 3 2
#> 8 1950 1 5 23 1950-01-05 23:00:00 4 1
Created on 2018-10-02 by the reprex package (v0.2.0).
Upvotes: 3