Reputation: 11
I have a multi-annual temperature time-series with a datetime column (with irregular time steps) as such :
daytime Temperature
<dttm> <dbl>
1981-10-01 09:00:00 22
1981-10-02 09:00:00 21.6
1981-10-03 09:00:00 20.3
1981-10-04 09:00:00 20.4
1981-10-05 09:00:00 20.6
1981-10-05 11:00:00 21
I would like to find all cases that match 2 conditions where temperature difference (max-min) within a certain time span (48h) is at least 4°C. Put in other words, find all the rapid and important temperature changes (wether increases or decreases. Ideally the outcome would be a dataframe with 1 column containing the first date of each detected case and a second column with the asssociated temperature difference (>= 4°C) of the given case.
I tried through the Rbeast tools which detects breakpoints but it doesn't seem to be able to detect such short-term variations.
Thank you very much you smart people !!!
Upvotes: 0
Views: 79
Reputation: 160687
I'm going to create my own sample data, since the provided data will never trigger the 4-degree difference. I'll include a gap so that we show it's a rolling 48h window and not a rolling 2-row window.
(I'm inferring you are using dplyr
since you have a tbl_df
sample data.)
library(dplyr)
set.seed(2023)
quux <- tibble(
daytime = seq(as.POSIXct("2023-10-01 09:00:00"), length.out = 12, by = "day")[-(3:4)],
Temperature = 20 + rnorm(10, sd = 3)
)
quux
# # A tibble: 10 × 2
# daytime Temperature
# <dttm> <dbl>
# 1 2023-10-01 09:00:00 19.7
# 2 2023-10-02 09:00:00 17.1
# 3 2023-10-03 09:00:00 14.4
# 4 2023-10-06 09:00:00 19.4
# 5 2023-10-07 09:00:00 18.1
# 6 2023-10-08 09:00:00 23.3
# 7 2023-10-09 09:00:00 17.3
# 8 2023-10-10 09:00:00 23.0
# 9 2023-10-11 09:00:00 18.8
# 10 2023-10-12 09:00:00 18.6
Note that since we don't have 10-04
or 10-05
, the gap from 14.4
to 19.4
should not trigger anything.
From here, we need a rolling-window to define the "48 hours" thing.
out <- quux |>
arrange(daytime) |>
mutate(
wid = sapply(row_number(), function(rn) rev(which(cumsum(as.numeric(diff(daytime[rn:n()]), units = "hours")) <= 48) + 1)[1]),
wid = coalesce(wid, 1L),
tempdiff = zoo::rollapply(Temperature, width = wid, align = "left", partial = TRUE,
FUN = function(z) diff(range(z)))
)
out
# # A tibble: 10 × 4
# daytime Temperature wid tempdiff
# <dttm> <dbl> <dbl> <dbl>
# 1 2023-10-01 09:00:00 19.7 3 5.37
# 2 2023-10-02 09:00:00 17.1 2 2.68
# 3 2023-10-03 09:00:00 14.4 1 0
# 4 2023-10-06 09:00:00 19.4 3 5.17
# 5 2023-10-07 09:00:00 18.1 3 6.01
# 6 2023-10-08 09:00:00 23.3 3 6.01
# 7 2023-10-09 09:00:00 17.3 3 5.75
# 8 2023-10-10 09:00:00 23.0 3 4.41
# 9 2023-10-11 09:00:00 18.8 2 0.207
# 10 2023-10-12 09:00:00 18.6 1 0
With that, filtering out those above 4 degrees is straight-forward:
out |>
filter(tempdiff >= 4)
# # A tibble: 6 × 4
# daytime Temperature wid tempdiff
# <dttm> <dbl> <dbl> <dbl>
# 1 2023-10-01 09:00:00 19.7 3 5.37
# 2 2023-10-06 09:00:00 19.4 3 5.17
# 3 2023-10-07 09:00:00 18.1 3 6.01
# 4 2023-10-08 09:00:00 23.3 3 6.01
# 5 2023-10-09 09:00:00 17.3 3 5.75
# 6 2023-10-10 09:00:00 23.0 3 4.41
The first step above is to define wid
th, which indicates how many rows (including the current row) are included in the "next 48 hours"; you can see that it reduces when it approaches the 2-day gap I imposed in the sample data. The day prior to the 2-day gap we have a tempdiff
of 0, which is because 14.4
is the only observation available, and diff(range(14.4))
is 0.
Upvotes: 1