Tristan Lorino
Tristan Lorino

Reputation: 31

Using dplyr for multiple mutate on different criteria

This is my repex:

dates <- seq(as.POSIXct("2015-01-01 13:10:00", tz = "UTC"), as.POSIXct("2015-01-01 13:10:10", tz="UTC"), by="1 sec")
dates[dst(dates)] <- dates[dst(dates)] - 3600
datavalues <- data.frame(x=c(90,90,80,65,NA,64,71,75,62,63,74))
data <- cbind(dates,datavalues)
data

                 dates  x
1  2015-01-01 13:10:00 90
2  2015-01-01 13:10:01 90
3  2015-01-01 13:10:02 80
4  2015-01-01 13:10:03 65
5  2015-01-01 13:10:04 NA
6  2015-01-01 13:10:05 64
7  2015-01-01 13:10:06 71
8  2015-01-01 13:10:07 75
9  2015-01-01 13:10:08 62
10 2015-01-01 13:10:09 63
11 2015-01-01 13:10:10 74

I would have to obtain the following data frame (which I will concatenate to data):

results <- data.frame(Duration=c(3,3,3,0,0,0,2,2,0,0,1),Maxx=c(90,90,90,0,0,0,75,75,0,0,74),Delta=c(0,0,0,0,0,0,7,0,0,0,11))
results
   Duration Maxx Delta
1         3   90     0
2         3   90     0
3         3   90     0
4         0    0     0
5         0    0     0
6         0    0     0
7         2   75     7
8         2   75     0
9         0    0     0
10        0    0     0
11        1   74    11

I fix a threshold to 70.

I would like if possible to get code using dplyr because around this pice of code, there is already dplyr code. Thank you in advance.

Upvotes: 1

Views: 67

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389175

With the help of data.table rleid you can create group of consecutive values which are above or below the threshold and calculate the numbers in each group.

library(dplyr)
library(data.table)

threshold <- 70

data %>%
  #Create a unique group of consecutive values
  group_by(group = rleid(replace(x, is.na(x), 0) < threshold)) %>% 
  #If the value is less than threshold put 0 in duration or else
  #include number of observations in the group. Do the same for max value.
  mutate(Duration = if_else(x < threshold, 0L, n(), missing = 0L), 
         #+(Duration > 0) is used to turn values less than threshold to 0
         Maxx = max(x, na.rm = TRUE) * +(Duration > 0)) %>%
  ungroup() %>%
  #Subtract current value with previous value
  mutate(Delta = x - lag(x), 
         #Keep only those values that are first row in each group
         Delta = replace(Delta, group == lag(group, default = first(group)) | 
                                Duration == 0, 0)) %>%
  select(-group)


#   dates                   x Duration  Maxx Delta
#   <dttm>              <dbl>    <int> <dbl> <dbl>
# 1 2015-01-01 13:10:00    90        3    90     0
# 2 2015-01-01 13:10:01    90        3    90     0
# 3 2015-01-01 13:10:02    80        3    90     0
# 4 2015-01-01 13:10:03    65        0     0     0
# 5 2015-01-01 13:10:04    NA        0     0     0
# 6 2015-01-01 13:10:05    64        0     0     0
# 7 2015-01-01 13:10:06    71        2    75     7
# 8 2015-01-01 13:10:07    75        2    75     0
# 9 2015-01-01 13:10:08    62        0     0     0
#10 2015-01-01 13:10:09    63        0     0     0
#11 2015-01-01 13:10:10    74        1    74    11

Upvotes: 1

Related Questions