Rahmat Hidayat
Rahmat Hidayat

Reputation: 37

Error in rle function to frequency and duration for at least two consecutive months

I have data recording drought conditions over several months. I want to calculate the total duration of droughts and the average duration of droughts, with the condition that a drought starts when the SPI value falls below <=-1 for at least two consecutive months and ends when the SPI returns to a positive value.

Here's an example of SPI data:

data <- c(-1,-1,-1,-0.5,0,-0.5,-1,0,-1,-1,-0.8,0,0,0)
data[1:2] == conditions are met, drought start
data[1:4]
data[5] == values >= 0, drought stop
Drought Event == 1, with a 4 months of duration
data[5:8] == conditions are not met, at least two consecutive months

data[9:10]== conditions are met, drought start
data[9:11] 
data[12] == values >= 0, drought stop
Drought Event == 1, with a 3 months of duration
Total Drought Event == 2

I have attempted to create a function as follows:

d.frequency <- function(x, na.rm= TRUE) {
  rle_result <- rle(x <= -1)
  drought_runs <- rle_result$values
  drought_lengths <- rle_result$lengths
  # Mencari indeks di mana kekeringan dimulai
  drought_start_indices <- which(drought_runs & c(0, head(drought_runs, -1)) == FALSE)
  # Mencari indeks di mana kekeringan berakhir
  drought_end_indices <- which(drought_runs & c(tail(drought_runs, -1), 0) == FALSE)
  # Menghitung durasi kekeringan minimal 2 bulan
  valid_droughts <- which(drought_lengths >= 2)
  
  # Memfilter kejadian kekeringan yang memenuhi durasi minimal
  valid_drought_start <- drought_start_indices[drought_start_indices %in% valid_droughts]
  valid_drought_end <- drought_end_indices[drought_end_indices %in% valid_droughts]
  
  # Menghitung jumlah dan durasi kekeringan
  num_droughts <- length(valid_drought_start)
  drought_durations <- drought_end_indices[valid_drought_end] - drought_start_indices[valid_drought_start] + 1
  #return(num_droughts)
  # Menyimpan durasi kekeringan untuk keperluan analisis lebih lanjut
  durations_list <- list(num_droughts = num_droughts, durations = sum(drought_durations, na.rm = TRUE))
  
  return(durations_list)
}

And here is the result :

d.frequency(data)
$num_droughts
[1] 2

$durations
[1] 1

However, the result is not as expected. I want to get a total drought duration of 7 months and an average drought duration of total drought duration 7/2 months.

How can I modify this function to meet these requirements?

Thank you very much!

Upvotes: 0

Views: 104

Answers (3)

s_baldur
s_baldur

Reputation: 33488

We assume the data collection starts during a non-drought period. Otherwise change drought_on to TRUE.

foo <- function(x, drought_on = FALSE) {
  stopifnot(is.numeric(x))
  n <- length(x)
  drought <- vector(length = n)
  for (t in seq(1, n-1)) {
    if (drought_on) drought_on <- x[t] < 0
    else            drought_on <- all(x[t:(t+1)] <= -1)
    drought[t] <- drought_on
  }
  drought[n] <- drought_on && x[n] < 0
  total <- sum(drought)
  list(
    "Total duration" = total, 
    "Average length" = total / sum(rle(drought)$values) 
  )
}

foo(data)

# $`Total duration`
# [1] 7
# 
# $`Average length`
# [1] 3.5

Upvotes: 1

r2evans
r2evans

Reputation: 160447

Here's a method with a double-rle:

data <- c(-1,-1,-1,-0.5,0,-0.5,-1,0,-1,-1,-0.8,0.5,1,2)
##        ^^^^^^^^^^^^^                                drought #1
##                                  ^^^^^^^^^^         drought #2

r <- rle(findInterval(data, c(-Inf, -1+1e-9, 0, Inf)))
# from the findInterval: "1"=drought, "3"=not-drought, "2"=not-sure
ind <- which(r$values == 2)
if (length(ind)) r$values[ind] <- r$values[ind-1]

# any "drought-start" that is length-1 should be reset
for (ind2 in rev(which(r$values == 1L & r$lengths < 2))) {
  if (ind2 == 1) next
  r$lengths[ind2-1] <- r$lengths[ind2-1] + r$lengths[ind2]
  r$lengths[ind2] <- 0L
}
r2 <- rle(inverse.rle(r) == 1L)

r2
# Run Length Encoding
#   lengths: int [1:4] 4 4 3 3
#   values : logi [1:4] TRUE FALSE TRUE FALSE

This means that the first drought (first TRUE) is 4 steps long, and the second drought (second TRUE) is 3 steps long.

With this, we can derive your stats:


### total drought duration of 7 months
sum(r2$lengths[r2$values])
# [1] 7

### average drought duration of total drought duration 7/2 months
mean(r2$lengths[r2$values])
# [1] 3.5

Here is a table showing the data, the results of findInterval, some human-generated notes, and then the step number within each drought/not-drought period.

   data findInterval                                  note steps
1  -1.0            1  drought starts (since 1:2 are <= -1)     1
2  -1.0            1                                           2
3  -1.0            1                                           3
4  -0.5            2                                           4
5   0.0            3                         drought stops     1
6  -0.5            2                                           2
7  -1.0            1                                           3
8   0.0            3                                           4
9  -1.0            1 drought starts (since 9:10 are <= -1)     1
10 -1.0            1                                           2
11 -0.8            2                                           3
12  0.5            3                         drought stops     1
13  1.0            3                                           2
14  2.0            3                                           3

Upvotes: 1

Nir Graham
Nir Graham

Reputation: 5137

Here is how I would do it; think of the drought like a state machine, and the possible transitions, and cover them all

dlevels <- c("NO DEFINED DROUGHT", "DROUGHT START", "DROUGHT CONTINUES", "DROUGHT END")



## possible moves

# NDD -> NDD / DS (1)
# DC -> DC (2) / DE (3)
# DE -> NDD (4) / DS (5)
# DS -> DC (5) / DE (6)

dvec <- c(-1,-1,-1,-0.5,0,-0.5,-1,0,-1,-1,-0.8,0.5,1,2) 

info_frame <- data.frame(
  cur = dvec,
  prev = c(NA, head(dvec, n = -1))
)

calculate_droughts <- function(info_frame) {
  for (i in seq_len(nrow(info_frame))) {
    if (i == 1) {
      info_frame[1, "status"] <- "NO DEFINED DROUGHT"
    } else {
      if (i == 16) {
        browser()
      }
      cur <- info_frame[i, "cur"]
      prev <- info_frame[i, "prev"]
      pstat <- info_frame[i - 1, "status"]

      found_status <- NA
      
      if (pstat == "NO DEFINED DROUGHT") { # NDD / DS (1)
        # either it starts or remain undefined
        found_status <- {
          if (cur <= -1 & prev <= -1) {
            "DROUGHT START"
          } else {
            "NO DEFINED DROUGHT"
          }
        }
      } else if (pstat == "DROUGHT CONTINUES") { # DC (2) / DE (3)
        # either it ends or it continues
        found_status <- {
          if (cur > 0 & prev > 0) {
            "DROUGHT END"
          } else {
            "DROUGHT CONTINUES"
          }
        }
      } else if (pstat == "DROUGHT END") { # NDD (4) / DS (5)
        # having previously ended, we are either undefined or starting again
        found_status <- if (cur <= -1 & prev <= -1) {
          "DROUGHT START"
        } else {
          "NO DEFINED DROUGHT"
        }
      } else if (pstat == "DROUGHT START") { # DS -> DC (6) / DE (7)
        # either the initiated drought ends or continues
        found_status <- {
          if (cur > 0 & prev > 0) {
            "DROUGHT END"
          } else {
            "DROUGHT CONTINUES"
          }
        }
      }
      info_frame[i, "status"] <- found_status
    }
  }
  info_frame
}



(info_frame2 <- calculate_droughts(info_frame = info_frame))

rle_version <- rle(info_frame2$status)

library(tidyverse)

(rle_frame <- tibble(
  status = rle_version$values,
  length = rle_version$lengths
))

# we can do some sorts of analysis 
(total_periods <- sum(rle_frame$length))
(total_drought_duration <- filter(rle_frame, status %in% c("DROUGHT START",
                                                           "DROUGHT CONTINUES")) |> pull(length) |> sum())
rle_frame |>
  mutate(unique_drought_counter = cumsum(status == "DROUGHT START")) |>
  group_by(unique_drought_counter) |>
  mutate(particular_drought_length = sum(length * 
                                           (status %in% c("DROUGHT START",
                                                          "DROUGHT CONTINUES"))))

Upvotes: 0

Related Questions