Reputation: 37
I have data recording drought conditions over several months. I want to calculate the total duration of droughts and the average duration of droughts, with the condition that a drought starts when the SPI value falls below <=-1 for at least two consecutive months and ends when the SPI returns to a positive value.
Here's an example of SPI data:
data <- c(-1,-1,-1,-0.5,0,-0.5,-1,0,-1,-1,-0.8,0,0,0)
data[1:2] == conditions are met, drought start
data[1:4]
data[5] == values >= 0, drought stop
Drought Event == 1, with a 4 months of duration
data[5:8] == conditions are not met, at least two consecutive months
data[9:10]== conditions are met, drought start
data[9:11]
data[12] == values >= 0, drought stop
Drought Event == 1, with a 3 months of duration
Total Drought Event == 2
I have attempted to create a function as follows:
d.frequency <- function(x, na.rm= TRUE) {
rle_result <- rle(x <= -1)
drought_runs <- rle_result$values
drought_lengths <- rle_result$lengths
# Mencari indeks di mana kekeringan dimulai
drought_start_indices <- which(drought_runs & c(0, head(drought_runs, -1)) == FALSE)
# Mencari indeks di mana kekeringan berakhir
drought_end_indices <- which(drought_runs & c(tail(drought_runs, -1), 0) == FALSE)
# Menghitung durasi kekeringan minimal 2 bulan
valid_droughts <- which(drought_lengths >= 2)
# Memfilter kejadian kekeringan yang memenuhi durasi minimal
valid_drought_start <- drought_start_indices[drought_start_indices %in% valid_droughts]
valid_drought_end <- drought_end_indices[drought_end_indices %in% valid_droughts]
# Menghitung jumlah dan durasi kekeringan
num_droughts <- length(valid_drought_start)
drought_durations <- drought_end_indices[valid_drought_end] - drought_start_indices[valid_drought_start] + 1
#return(num_droughts)
# Menyimpan durasi kekeringan untuk keperluan analisis lebih lanjut
durations_list <- list(num_droughts = num_droughts, durations = sum(drought_durations, na.rm = TRUE))
return(durations_list)
}
And here is the result :
d.frequency(data)
$num_droughts
[1] 2
$durations
[1] 1
However, the result is not as expected. I want to get a total drought duration of 7 months and an average drought duration of total drought duration 7/2 months.
How can I modify this function to meet these requirements?
Thank you very much!
Upvotes: 0
Views: 104
Reputation: 33488
We assume the data collection starts during a non-drought period. Otherwise change drought_on to TRUE
.
foo <- function(x, drought_on = FALSE) {
stopifnot(is.numeric(x))
n <- length(x)
drought <- vector(length = n)
for (t in seq(1, n-1)) {
if (drought_on) drought_on <- x[t] < 0
else drought_on <- all(x[t:(t+1)] <= -1)
drought[t] <- drought_on
}
drought[n] <- drought_on && x[n] < 0
total <- sum(drought)
list(
"Total duration" = total,
"Average length" = total / sum(rle(drought)$values)
)
}
foo(data)
# $`Total duration`
# [1] 7
#
# $`Average length`
# [1] 3.5
Upvotes: 1
Reputation: 160447
Here's a method with a double-rle:
data <- c(-1,-1,-1,-0.5,0,-0.5,-1,0,-1,-1,-0.8,0.5,1,2)
## ^^^^^^^^^^^^^ drought #1
## ^^^^^^^^^^ drought #2
r <- rle(findInterval(data, c(-Inf, -1+1e-9, 0, Inf)))
# from the findInterval: "1"=drought, "3"=not-drought, "2"=not-sure
ind <- which(r$values == 2)
if (length(ind)) r$values[ind] <- r$values[ind-1]
# any "drought-start" that is length-1 should be reset
for (ind2 in rev(which(r$values == 1L & r$lengths < 2))) {
if (ind2 == 1) next
r$lengths[ind2-1] <- r$lengths[ind2-1] + r$lengths[ind2]
r$lengths[ind2] <- 0L
}
r2 <- rle(inverse.rle(r) == 1L)
r2
# Run Length Encoding
# lengths: int [1:4] 4 4 3 3
# values : logi [1:4] TRUE FALSE TRUE FALSE
This means that the first drought (first TRUE
) is 4
steps long, and the second drought (second TRUE
) is 3 steps long.
With this, we can derive your stats:
### total drought duration of 7 months
sum(r2$lengths[r2$values])
# [1] 7
### average drought duration of total drought duration 7/2 months
mean(r2$lengths[r2$values])
# [1] 3.5
Here is a table showing the data
, the results of findInterval
, some human-generated notes, and then the step number within each drought/not-drought period.
data findInterval note steps
1 -1.0 1 drought starts (since 1:2 are <= -1) 1
2 -1.0 1 2
3 -1.0 1 3
4 -0.5 2 4
5 0.0 3 drought stops 1
6 -0.5 2 2
7 -1.0 1 3
8 0.0 3 4
9 -1.0 1 drought starts (since 9:10 are <= -1) 1
10 -1.0 1 2
11 -0.8 2 3
12 0.5 3 drought stops 1
13 1.0 3 2
14 2.0 3 3
Upvotes: 1
Reputation: 5137
Here is how I would do it; think of the drought like a state machine, and the possible transitions, and cover them all
dlevels <- c("NO DEFINED DROUGHT", "DROUGHT START", "DROUGHT CONTINUES", "DROUGHT END")
## possible moves
# NDD -> NDD / DS (1)
# DC -> DC (2) / DE (3)
# DE -> NDD (4) / DS (5)
# DS -> DC (5) / DE (6)
dvec <- c(-1,-1,-1,-0.5,0,-0.5,-1,0,-1,-1,-0.8,0.5,1,2)
info_frame <- data.frame(
cur = dvec,
prev = c(NA, head(dvec, n = -1))
)
calculate_droughts <- function(info_frame) {
for (i in seq_len(nrow(info_frame))) {
if (i == 1) {
info_frame[1, "status"] <- "NO DEFINED DROUGHT"
} else {
if (i == 16) {
browser()
}
cur <- info_frame[i, "cur"]
prev <- info_frame[i, "prev"]
pstat <- info_frame[i - 1, "status"]
found_status <- NA
if (pstat == "NO DEFINED DROUGHT") { # NDD / DS (1)
# either it starts or remain undefined
found_status <- {
if (cur <= -1 & prev <= -1) {
"DROUGHT START"
} else {
"NO DEFINED DROUGHT"
}
}
} else if (pstat == "DROUGHT CONTINUES") { # DC (2) / DE (3)
# either it ends or it continues
found_status <- {
if (cur > 0 & prev > 0) {
"DROUGHT END"
} else {
"DROUGHT CONTINUES"
}
}
} else if (pstat == "DROUGHT END") { # NDD (4) / DS (5)
# having previously ended, we are either undefined or starting again
found_status <- if (cur <= -1 & prev <= -1) {
"DROUGHT START"
} else {
"NO DEFINED DROUGHT"
}
} else if (pstat == "DROUGHT START") { # DS -> DC (6) / DE (7)
# either the initiated drought ends or continues
found_status <- {
if (cur > 0 & prev > 0) {
"DROUGHT END"
} else {
"DROUGHT CONTINUES"
}
}
}
info_frame[i, "status"] <- found_status
}
}
info_frame
}
(info_frame2 <- calculate_droughts(info_frame = info_frame))
rle_version <- rle(info_frame2$status)
library(tidyverse)
(rle_frame <- tibble(
status = rle_version$values,
length = rle_version$lengths
))
# we can do some sorts of analysis
(total_periods <- sum(rle_frame$length))
(total_drought_duration <- filter(rle_frame, status %in% c("DROUGHT START",
"DROUGHT CONTINUES")) |> pull(length) |> sum())
rle_frame |>
mutate(unique_drought_counter = cumsum(status == "DROUGHT START")) |>
group_by(unique_drought_counter) |>
mutate(particular_drought_length = sum(length *
(status %in% c("DROUGHT START",
"DROUGHT CONTINUES"))))
Upvotes: 0