Reputation: 65
I've been struggling with this for a while and could use some help.
I am trying to look at changes in consecutive dry days in a meteorological dataset. My dataframe has two columns, date and precipitation, and it looks something like this:
date <- seq(as.Date("2000-01-01"), as.Date("2000-01-30"), by="days")
precipitation <- c(1,0,0,0,5,4,7,0,5,0,0,0,0,0,0,8,4,0,0,2,0,0,2,4,5,1,1,0,0,3)
Data <- data.frame(date, precipitation)
I'd like to create a new column that shows the length of each stretch of consecutive days where precipitation==0
. In other words, I would like the output to look something like this:
dry.streak <- c(0,3,0,0,0,0,0,1,0,6,0,0,0,0,0,0,0,2,0,0,2,0,0,0,0,0,0,2,0,0)
Data <- data.frame(date, precipitation, dry.streak)
where the value in dry.streak
is the number of days that precipitation==0
starting from that point forward until it rains again (precipitation>0
) but only for the first day in a consecutive series of dry days (the value should be zero, NA
, or something else I can easily filter out for all days that aren't the first day of a dry streak so I can look at each streak independently when I run statistics on this).
So far, I've used dry.streak2 <- with(rle(as.logical(Data$precipitation==0)), lengths[values])
to create a separate vector of the length of each dry streak, but I don't know how to actually match this up with my dates which I'll need to do to look at changes in the length of dry streaks over time.
I'd appreciate any help you can give. Thank you!
Upvotes: 3
Views: 1116
Reputation: 164
Not the most elegant solution but it gets the job done. Use a for
loop (or sapply
if you like) and think through the the steps of the algorithm clearly into if-else
statements. We need to start a counter on the first dry day after a wet day and keep track of that first day. Then, when the streak ends, store the total days and overwrite the values.
date <- seq(as.Date("2000-01-01"), as.Date("2000-01-30"), by="days")
precipitation <- c(1,0,0,0,5,4,7,0,5,0,0,0,0,0,0,8,4,0,0,2,0,0,2,4,5,1,1,0,0,3)
Data <- data.frame(date, precipitation)
count=0 # create a var to count dry days
dry.streak=rep(0, length(Data$precipitation)) # storage vector
# loop through the precip vector
for(i in 1:length(Data$precipitation)){
p <- Data$precipitation[i] # get the current value
if(count==0 & p==0){ # if the previous day was not dry and today is
startLoc <- i # set the starting of the dry spell
count <- count+1 # add one to the count
}else if(count!=0 & p==0){ # if the dry spell is continuing
count <- count+1 # add one to the count
}else{ # not a dry day
dry.streak[startLoc] <- count # stick the count into the starting day
count <- 0 # reset the counter
startLoc <- i # incriment the startLoc to prevent overwritting
}
}
Upvotes: 1
Reputation: 4150
Here is my take on the problem I provided 3 possible outputs because I didn't understand what you needed
library(tidyverse)
date <- seq(as.Date("2000-01-01"), as.Date("2000-01-30"), by="days")
precipitation <- c(1,0,0,0,5,4,7,0,5,0,0,0,0,0,0,8,4,0,0,2,0,0,2,4,5,1,1,0,0,3)
Data <- data.frame(date, precipitation)
dry.streak <- c(0,3,0,0,0,0,0,1,0,6,0,0,0,0,0,0,0,2,0,0,2,0,0,0,0,0,0,2,0,0)
Data <- data.frame(date, precipitation, dry.streak)
make_rle_seq <- function(x) {
rle_res <- rle(x)
values <- rle_res$values
lens <- rle_res$lengths
rep(values,lens)
}
make_rle_ind <- function(x) {
rle_res <- rle(x)
values <- rle_res$values
lens <- rle_res$lengths
rep(1:length(lens),lens)
}
Data %>%
mutate(rles = make_rle_seq(precipitation==0),
indexes = make_rle_ind(precipitation==0)) %>%
group_by(indexes) %>%
mutate(streaks = cumsum(rles),
streaks_max = max(streaks),
streaks_keep_first = if_else(streaks == min(streaks),streaks_max,0L))
#> # A tibble: 30 x 8
#> # Groups: indexes [13]
#> date precipitation dry.streak rles indexes streaks streaks_max
#> <date> <dbl> <dbl> <lgl> <int> <int> <int>
#> 1 2000-01-01 1 0 FALSE 1 0 0
#> 2 2000-01-02 0 3 TRUE 2 1 3
#> 3 2000-01-03 0 0 TRUE 2 2 3
#> 4 2000-01-04 0 0 TRUE 2 3 3
#> 5 2000-01-05 5 0 FALSE 3 0 0
#> 6 2000-01-06 4 0 FALSE 3 0 0
#> 7 2000-01-07 7 0 FALSE 3 0 0
#> 8 2000-01-08 0 1 TRUE 4 1 1
#> 9 2000-01-09 5 0 FALSE 5 0 0
#> 10 2000-01-10 0 6 TRUE 6 1 6
#> # ... with 20 more rows, and 1 more variable: streaks_keep_first <int>
Created on 2020-12-07 by the reprex package (v0.3.0)
Upvotes: 2