Robert
Robert

Reputation: 65

Creating a column that counts consecutive days from a different column in R

I've been struggling with this for a while and could use some help.

I am trying to look at changes in consecutive dry days in a meteorological dataset. My dataframe has two columns, date and precipitation, and it looks something like this:

date <- seq(as.Date("2000-01-01"), as.Date("2000-01-30"), by="days")
precipitation <- c(1,0,0,0,5,4,7,0,5,0,0,0,0,0,0,8,4,0,0,2,0,0,2,4,5,1,1,0,0,3)
Data <- data.frame(date, precipitation)

I'd like to create a new column that shows the length of each stretch of consecutive days where precipitation==0. In other words, I would like the output to look something like this:

dry.streak <- c(0,3,0,0,0,0,0,1,0,6,0,0,0,0,0,0,0,2,0,0,2,0,0,0,0,0,0,2,0,0)
Data <- data.frame(date, precipitation, dry.streak)

where the value in dry.streak is the number of days that precipitation==0 starting from that point forward until it rains again (precipitation>0) but only for the first day in a consecutive series of dry days (the value should be zero, NA, or something else I can easily filter out for all days that aren't the first day of a dry streak so I can look at each streak independently when I run statistics on this).

So far, I've used dry.streak2 <- with(rle(as.logical(Data$precipitation==0)), lengths[values]) to create a separate vector of the length of each dry streak, but I don't know how to actually match this up with my dates which I'll need to do to look at changes in the length of dry streaks over time.

I'd appreciate any help you can give. Thank you!

Upvotes: 3

Views: 1116

Answers (2)

J Thompson
J Thompson

Reputation: 164

Not the most elegant solution but it gets the job done. Use a for loop (or sapply if you like) and think through the the steps of the algorithm clearly into if-else statements. We need to start a counter on the first dry day after a wet day and keep track of that first day. Then, when the streak ends, store the total days and overwrite the values.

date <- seq(as.Date("2000-01-01"), as.Date("2000-01-30"), by="days")
precipitation <- c(1,0,0,0,5,4,7,0,5,0,0,0,0,0,0,8,4,0,0,2,0,0,2,4,5,1,1,0,0,3)
Data <- data.frame(date, precipitation)

count=0 # create a var to count dry days
dry.streak=rep(0, length(Data$precipitation)) # storage vector

# loop through the precip vector
for(i in 1:length(Data$precipitation)){
  p <- Data$precipitation[i] # get the current value
  if(count==0 & p==0){ # if the previous day was not dry and today is
    startLoc <- i # set the starting of the dry spell
    count <- count+1 # add one to the count
  }else if(count!=0 & p==0){ # if the dry spell is continuing
    count <- count+1 # add one to the count
  }else{ # not a dry day
    dry.streak[startLoc] <- count # stick the count into the starting day
    count <- 0 # reset the counter
    startLoc <- i # incriment the startLoc to prevent overwritting
  }
}

Upvotes: 1

Bruno
Bruno

Reputation: 4150

Here is my take on the problem I provided 3 possible outputs because I didn't understand what you needed

library(tidyverse)

date <- seq(as.Date("2000-01-01"), as.Date("2000-01-30"), by="days")
precipitation <- c(1,0,0,0,5,4,7,0,5,0,0,0,0,0,0,8,4,0,0,2,0,0,2,4,5,1,1,0,0,3)
Data <- data.frame(date, precipitation)


  
dry.streak <- c(0,3,0,0,0,0,0,1,0,6,0,0,0,0,0,0,0,2,0,0,2,0,0,0,0,0,0,2,0,0)
Data <- data.frame(date, precipitation, dry.streak)


make_rle_seq <- function(x) {
  rle_res <- rle(x)
  values <- rle_res$values
  lens <- rle_res$lengths
  rep(values,lens)
}


make_rle_ind <- function(x) {
  rle_res <- rle(x)
  values <- rle_res$values
  lens <- rle_res$lengths
  rep(1:length(lens),lens)
}

Data %>% 
  mutate(rles = make_rle_seq(precipitation==0),
         indexes = make_rle_ind(precipitation==0)) %>% 
  group_by(indexes) %>% 
  mutate(streaks = cumsum(rles),
         streaks_max = max(streaks),
         streaks_keep_first = if_else(streaks == min(streaks),streaks_max,0L))
#> # A tibble: 30 x 8
#> # Groups:   indexes [13]
#>    date       precipitation dry.streak rles  indexes streaks streaks_max
#>    <date>             <dbl>      <dbl> <lgl>   <int>   <int>       <int>
#>  1 2000-01-01             1          0 FALSE       1       0           0
#>  2 2000-01-02             0          3 TRUE        2       1           3
#>  3 2000-01-03             0          0 TRUE        2       2           3
#>  4 2000-01-04             0          0 TRUE        2       3           3
#>  5 2000-01-05             5          0 FALSE       3       0           0
#>  6 2000-01-06             4          0 FALSE       3       0           0
#>  7 2000-01-07             7          0 FALSE       3       0           0
#>  8 2000-01-08             0          1 TRUE        4       1           1
#>  9 2000-01-09             5          0 FALSE       5       0           0
#> 10 2000-01-10             0          6 TRUE        6       1           6
#> # ... with 20 more rows, and 1 more variable: streaks_keep_first <int>

Created on 2020-12-07 by the reprex package (v0.3.0)

Upvotes: 2

Related Questions