St_learning
St_learning

Reputation: 11

Find cases of abrupt changes (value differences) within a certain time span with R

I have a multi-annual temperature time-series with a datetime column (with irregular time steps) as such :

 daytime             Temperature 
<dttm>                    <dbl>   
1981-10-01 09:00:00        22  
1981-10-02 09:00:00        21.6    
1981-10-03 09:00:00        20.3  
1981-10-04 09:00:00        20.4  
1981-10-05 09:00:00        20.6   
1981-10-05 11:00:00        21    

I would like to find all cases that match 2 conditions where temperature difference (max-min) within a certain time span (48h) is at least 4°C. Put in other words, find all the rapid and important temperature changes (wether increases or decreases. Ideally the outcome would be a dataframe with 1 column containing the first date of each detected case and a second column with the asssociated temperature difference (>= 4°C) of the given case.

I tried through the Rbeast tools which detects breakpoints but it doesn't seem to be able to detect such short-term variations.

Thank you very much you smart people !!!

Upvotes: 0

Views: 79

Answers (1)

r2evans
r2evans

Reputation: 160687

I'm going to create my own sample data, since the provided data will never trigger the 4-degree difference. I'll include a gap so that we show it's a rolling 48h window and not a rolling 2-row window.

(I'm inferring you are using dplyr since you have a tbl_df sample data.)

library(dplyr)
set.seed(2023)
quux <- tibble(
  daytime = seq(as.POSIXct("2023-10-01 09:00:00"), length.out = 12, by = "day")[-(3:4)],
  Temperature = 20 + rnorm(10, sd = 3)
)
quux
# # A tibble: 10 × 2
#    daytime             Temperature
#    <dttm>                    <dbl>
#  1 2023-10-01 09:00:00        19.7
#  2 2023-10-02 09:00:00        17.1
#  3 2023-10-03 09:00:00        14.4
#  4 2023-10-06 09:00:00        19.4
#  5 2023-10-07 09:00:00        18.1
#  6 2023-10-08 09:00:00        23.3
#  7 2023-10-09 09:00:00        17.3
#  8 2023-10-10 09:00:00        23.0
#  9 2023-10-11 09:00:00        18.8
# 10 2023-10-12 09:00:00        18.6

Note that since we don't have 10-04 or 10-05, the gap from 14.4 to 19.4 should not trigger anything.

From here, we need a rolling-window to define the "48 hours" thing.

out <- quux |>
  arrange(daytime) |>
  mutate(
    wid = sapply(row_number(), function(rn) rev(which(cumsum(as.numeric(diff(daytime[rn:n()]), units = "hours")) <= 48) + 1)[1]),
    wid = coalesce(wid, 1L),
    tempdiff = zoo::rollapply(Temperature, width = wid, align = "left", partial = TRUE,
                              FUN = function(z) diff(range(z)))
  )
out
# # A tibble: 10 × 4
#    daytime             Temperature   wid tempdiff
#    <dttm>                    <dbl> <dbl>    <dbl>
#  1 2023-10-01 09:00:00        19.7     3    5.37 
#  2 2023-10-02 09:00:00        17.1     2    2.68 
#  3 2023-10-03 09:00:00        14.4     1    0    
#  4 2023-10-06 09:00:00        19.4     3    5.17 
#  5 2023-10-07 09:00:00        18.1     3    6.01 
#  6 2023-10-08 09:00:00        23.3     3    6.01 
#  7 2023-10-09 09:00:00        17.3     3    5.75 
#  8 2023-10-10 09:00:00        23.0     3    4.41 
#  9 2023-10-11 09:00:00        18.8     2    0.207
# 10 2023-10-12 09:00:00        18.6     1    0    

With that, filtering out those above 4 degrees is straight-forward:

out |>
  filter(tempdiff >= 4)
# # A tibble: 6 × 4
#   daytime             Temperature   wid tempdiff
#   <dttm>                    <dbl> <dbl>    <dbl>
# 1 2023-10-01 09:00:00        19.7     3     5.37
# 2 2023-10-06 09:00:00        19.4     3     5.17
# 3 2023-10-07 09:00:00        18.1     3     6.01
# 4 2023-10-08 09:00:00        23.3     3     6.01
# 5 2023-10-09 09:00:00        17.3     3     5.75
# 6 2023-10-10 09:00:00        23.0     3     4.41

The first step above is to define width, which indicates how many rows (including the current row) are included in the "next 48 hours"; you can see that it reduces when it approaches the 2-day gap I imposed in the sample data. The day prior to the 2-day gap we have a tempdiff of 0, which is because 14.4 is the only observation available, and diff(range(14.4)) is 0.

Upvotes: 1

Related Questions