Reputation: 103
I have a dataframe where the first column is time, and the rest different numeric variables that sometimes have NA.
df <- read.table(text =
"Date column_A column_B
1 2019-10-15 5000 3500
2 2019-10-16 1000 1200
3 2019-10-17 5000 3500
4 2019-10-18 NA 1200
5 2019-10-19 NA 3500
6 2019-10-20 NA 1200
7 2019-10-21 5000 3500
8 2019-10-22 1000 1200
9 2019-10-23 5000 3500
10 2019-10-30 1000 1200
11 2019-11-01 1000 1200
12 2019-11-02 1000 1200
13 2019-11-03 1000 1200
14 2019-11-04 1000 1200
15 2019-11-05 1000 1200
" , header = TRUE)
df$Date <- as.Date.POSIXct(df$Date, tz= "Europe/Berlin")
What I need to do is to know what is the length of each range of consecutive values or NA's in time. Currently I am doing
df <- df %>%
mutate(Var1_interval_grp = cumsum(c(1, abs(diff(is.na(df[, column_A])))))) %>%
group_by(Var1_interval_grp) %>%
mutate(Range_Var1 = diff(range(Date)))
That gives me the time difference between the first and the last element of the group, although what I need is the time difference between the last element of a group and the last element of the previous group. This, evidently will cause an isssue on the first group since there is no previous group.
The result for Range_Var1 is 2, 2 and 15 for each group, although what I need is 2, 3 and 16.
Hope the explanation is clear.
Thanks in advance
Upvotes: 1
Views: 50
Reputation: 886938
We could use rleid
from data.table
to create the group
library(dplyr)
library(data.table)
df %>%
mutate(tm = Date - lag(Date, default = first(Date))) %>%
group_by(Var1_interval_grp = rleid(is.na(column_A))) %>%
mutate(tm = sum(tm))
Upvotes: 1
Reputation: 79188
You are probably looking for something similar to:
df %>%
mutate(tm = c(0, diff(Date)))%>%
group_by(Var1_interval_grp = cumsum(c(1, abs(diff(is.na(column_A))))))%>%
mutate(tm = sum(tm))
# A tibble: 15 x 5
# Groups: Var1_interval_grp [3]
Date column_A column_B tm Var1_interval_grp
<date> <int> <int> <dbl> <dbl>
1 2019-10-15 5000 3500 2 1
2 2019-10-16 1000 1200 2 1
3 2019-10-17 5000 3500 2 1
4 2019-10-18 NA 1200 3 2
5 2019-10-19 NA 3500 3 2
6 2019-10-20 NA 1200 3 2
7 2019-10-21 5000 3500 16 3
8 2019-10-22 1000 1200 16 3
9 2019-10-23 5000 3500 16 3
10 2019-10-30 1000 1200 16 3
11 2019-11-01 1000 1200 16 3
12 2019-11-02 1000 1200 16 3
13 2019-11-03 1000 1200 16 3
14 2019-11-04 1000 1200 16 3
15 2019-11-05 1000 1200 16 3
Upvotes: 3