CarlosSR
CarlosSR

Reputation: 103

Difference between last time in a group and last time on previous group

I have a dataframe where the first column is time, and the rest different numeric variables that sometimes have NA.

df <- read.table(text = 
               "Date         column_A      column_B

1               2019-10-15   5000          3500
2               2019-10-16   1000          1200
3               2019-10-17   5000          3500
4               2019-10-18   NA            1200
5               2019-10-19   NA            3500
6               2019-10-20   NA            1200
7               2019-10-21   5000          3500
8               2019-10-22   1000          1200
9               2019-10-23   5000          3500
10              2019-10-30   1000          1200
11              2019-11-01   1000          1200
12              2019-11-02   1000          1200
13              2019-11-03   1000          1200
14              2019-11-04   1000          1200
15              2019-11-05   1000          1200
" , header = TRUE)

df$Date <- as.Date.POSIXct(df$Date, tz= "Europe/Berlin")

What I need to do is to know what is the length of each range of consecutive values or NA's in time. Currently I am doing


df <- df %>% 
      mutate(Var1_interval_grp = cumsum(c(1, abs(diff(is.na(df[, column_A])))))) %>%    
      group_by(Var1_interval_grp) %>% 
      mutate(Range_Var1 = diff(range(Date))) 

That gives me the time difference between the first and the last element of the group, although what I need is the time difference between the last element of a group and the last element of the previous group. This, evidently will cause an isssue on the first group since there is no previous group.

The result for Range_Var1 is 2, 2 and 15 for each group, although what I need is 2, 3 and 16.

Hope the explanation is clear.

Thanks in advance

Upvotes: 1

Views: 50

Answers (2)

akrun
akrun

Reputation: 886938

We could use rleid from data.table to create the group

library(dplyr)
library(data.table)
df %>% 
   mutate(tm = Date - lag(Date, default = first(Date))) %>% 
   group_by(Var1_interval_grp = rleid(is.na(column_A))) %>% 
   mutate(tm = sum(tm))

Upvotes: 1

Onyambu
Onyambu

Reputation: 79188

You are probably looking for something similar to:

df %>% 
    mutate(tm = c(0, diff(Date)))%>%
    group_by(Var1_interval_grp = cumsum(c(1, abs(diff(is.na(column_A))))))%>%
    mutate(tm = sum(tm))
# A tibble: 15 x 5
# Groups:   Var1_interval_grp [3]
   Date       column_A column_B    tm Var1_interval_grp
   <date>        <int>    <int> <dbl>             <dbl>
 1 2019-10-15     5000     3500     2                 1
 2 2019-10-16     1000     1200     2                 1
 3 2019-10-17     5000     3500     2                 1
 4 2019-10-18       NA     1200     3                 2
 5 2019-10-19       NA     3500     3                 2
 6 2019-10-20       NA     1200     3                 2
 7 2019-10-21     5000     3500    16                 3
 8 2019-10-22     1000     1200    16                 3
 9 2019-10-23     5000     3500    16                 3
10 2019-10-30     1000     1200    16                 3
11 2019-11-01     1000     1200    16                 3
12 2019-11-02     1000     1200    16                 3
13 2019-11-03     1000     1200    16                 3
14 2019-11-04     1000     1200    16                 3
15 2019-11-05     1000     1200    16                 3

Upvotes: 3

Related Questions