Reputation: 1489
So I have a function whos idea it is to operate on a vector of numbers. E.g. a vector of temperatures. I want to compute heatwaves (in a very simplified way...). Lets say a heatwave starts with three consecutive days of above 30 °C.
So I would need a back-reference to store how long the current heatwave already is. I wrote a function that uses a for-loop internally. In pseudo-code it kind of looks like this:
is_heatwave = function(vals){
length_heatwave = 0
# returns a vector with the length of the input vals
day_in_heatwave = vector(length=length(vals))
days_in_current_heatwave =c()
for(i in 1:length(vals)){
val = vals[[i]]
if(val > 30){
length_heatwave = length_heatwave + 1
days_in_current_heatwave = c(days_in_current_heatwave, i)
}else{
length_heatwave = 0
}
... some more code
}
return(day_in_heatwave)
}
This code might be wrong. But the idea is that the function takes as input a vector with the length as the data.frame has rows. And returns a vector of the same length.
my idea is to have a function that I can use like this:
df = data.frame(
temps = c(30,30,32,30,24)
)
df %>% mutate(is_heatwave = is_heatwave(temps))
I just wanted to ask if this generally is a good idea or are there any better ideas?
Upvotes: 0
Views: 68
Reputation: 17648
You can try
set.seed;df = data.frame(
temps = sample(25:40, 100,replace = T)
)
df %>%
mutate(heatwave_length = cumsum(temps>=30)-cummax((temps<30)*cumsum(temps>=30)))%>%
as_tibble()
# A tibble: 100 × 2
temps heatwave_length
<int> <int>
1 33 1
2 38 2
3 30 3
4 35 4
5 30 5
6 37 6
7 35 7
8 35 8
9 38 9
10 29 0
The max number can get filtered by using sth like
mutate(max = ifelse(lead(heatwave_length) == 0, heatwave_length, NA))
Upvotes: 2
Reputation: 2132
Already good answers, so let's add some nuances.
This solution gives an unique streak_id
that may or may not be a heat_wave
. hot_days_acc
is the number of hot days accumulated on a streak.
The code:
# library(tidyverse)
# -------------------
# Number of days in a heat wave
heat_wave_days <- 3
# Temperature threshold
hot_day <- 30
# Some toy data
set.seed(100)
aux_df <- tibble(temp = sample(-2:2 + hot_day, 50, replace = TRUE))
#
aux_df <- aux_df %>%
mutate(
hot_days_acc = if_else(temp >= hot_day, TRUE, FALSE),
streak_id = consecutive_id(hot_days_acc)) %>%
add_count(streak_id, name = "heat_wave") %>%
mutate(
.by = streak_id,
heat_wave = if_else(
all(hot_days_acc == TRUE) & heat_wave >= heat_wave_days,
TRUE, FALSE)) %>%
mutate(streak_id = consecutive_id(heat_wave)) %>%
mutate(.by = streak_id, hot_days_acc = cumsum(hot_days_acc)) %>%
relocate(temp, streak_id, heat_wave, hot_days_acc)
The output:
> print(aux_df, n = nrow(aux_df))
# A tibble: 50 × 4
temp streak_id heat_wave hot_days_acc
<dbl> <int> <lgl> <int>
1 29 1 FALSE 0
2 30 1 FALSE 1
3 28 1 FALSE 1
4 29 1 FALSE 1
5 31 1 FALSE 2
6 31 1 FALSE 3
7 29 1 FALSE 3
8 30 1 FALSE 4
9 29 1 FALSE 4
10 32 2 TRUE 1
11 31 2 TRUE 2
12 30 2 TRUE 3
13 30 2 TRUE 4
14 29 3 FALSE 0
15 28 3 FALSE 0
16 29 3 FALSE 0
17 30 4 TRUE 1
18 31 4 TRUE 2
19 31 4 TRUE 3
20 31 4 TRUE 4
21 32 4 TRUE 5
22 30 4 TRUE 6
23 28 5 FALSE 0
24 30 5 FALSE 1
25 31 5 FALSE 2
26 29 5 FALSE 2
27 32 6 TRUE 1
28 32 6 TRUE 2
29 32 6 TRUE 3
30 28 7 FALSE 0
31 32 8 TRUE 1
32 31 8 TRUE 2
33 30 8 TRUE 3
34 28 9 FALSE 0
35 28 9 FALSE 0
36 28 9 FALSE 0
37 30 9 FALSE 1
38 28 9 FALSE 1
39 28 9 FALSE 1
40 31 10 TRUE 1
41 30 10 TRUE 2
42 32 10 TRUE 3
43 30 10 TRUE 4
44 31 10 TRUE 5
45 30 10 TRUE 6
46 30 10 TRUE 7
47 30 10 TRUE 8
48 31 10 TRUE 9
49 30 10 TRUE 10
50 32 10 TRUE 11
Upvotes: 2
Reputation: 3247
You can do this very concisely with dplyr::consecutive_id()
, which creates a grouping variable that increments whenever another variable changes. By creating a variable that represents hot days, we can then create groups that correspond to waves of hot and cold. We can then count the number of days in a group or determine which day of a heatwave we are in:
library(dplyr)
df <- data.frame(temps = c(30, 30, 30, 29, 30, 29, 29, 30, 29, 30, 30))
df <- mutate(df,
hot = temps >= 30,
wave = consecutive_id(hot)) |>
mutate(heatwave_length = sum(hot),
wave_day = 1:n() |>
replace(!hot, NA),
.by = wave) |>
select(temps, heatwave_length, wave_day)
df
#> temps heatwave_length wave_day
#> 1 30 3 1
#> 2 30 3 2
#> 3 30 3 3
#> 4 29 0 NA
#> 5 30 1 1
#> 6 29 0 NA
#> 7 29 0 NA
#> 8 30 1 1
#> 9 29 0 NA
#> 10 30 2 1
#> 11 30 2 2
Created on 2024-04-09 with reprex v2.1.0
consecutive_id()
work?Simple:
x_lag[n]
is equal to x[n+1]
)x == x_lag
) (default to TRUE
for x[1]
)TRUE
, the value has changed. We can therefore create a cumulative sum of our new variable which will increment by 1 every time the group changes.Here it is in base R:
base_consecutive_id <- function(x){
len <- length(x)
c(TRUE, x[2:len] != x[1:(len-1)]) |>
cumsum()
}
Created on 2024-04-09 with reprex v2.1.0
Upvotes: 3