CCurtis
CCurtis

Reputation: 1932

Sum Event Data in R

I'm working with some daily rainfall data that spans several years. I want to sum the rainfall on consecutive rainy day to get a rainfall total for that rainfall event. It would also be nice to get a start and stop date and rainfall intensity per event. I'm thinking I could hack something together with aggregate however what I'm thinking of doing in my head seems very bulky. Is there a quick and elegant solution possibly to be found with dplyr,tdyror data.table.

Data

structure(list(Time = structure(c(1353398400, 1353484800, 1353571200, 
1353657600, 1353744000, 1353830400, 1353916800, 1354003200, 1354089600, 
1354176000, 1354262400, 1354348800, 1354435200, 1354521600, 1354608000, 
1354694400, 1354780800, 1354867200, 1354953600, 1355040000, 1355126400, 
1355212800, 1355299200, 1355385600, 1355472000, 1355558400, 1355644800, 
1355731200, 1355817600, 1355904000, 1355990400, 1356076800, 1356163200, 
1356249600, 1356336000, 1356422400, 1356508800, 1356595200, 1356681600, 
1356768000, 1356854400, 1356940800, 1357027200, 1357113600, 1357200000, 
1357286400, 1357372800, 1357459200, 1357545600, 1357632000, 1357718400
), class = c("POSIXct", "POSIXt"), tzone = ""), inc = c(NA, NA, 
NA, NA, NA, NA, NA, 0.11, NA, 0.62, 0.0899999999999999, 0.39, 
NA, NA, 0.03, NA, NA, NA, NA, NA, NA, 0.34, NA, NA, NA, NA, 0.0600000000000001, 
0.02, NA, NA, NA, 0.29, 0.35, 0.02, 0.27, 0.17, 0.0600000000000001, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.47, NA, NA, NA, 0.0300000000000002
)), .Names = c("Time", "inc"), row.names = 50:100, class = "data.frame")

Desired output

Begin   End Days    Total   Intensity
11/27/2012  11/27/2012  1   0.11    0.11
11/29/2012  12/1/2012   3   1.1 0.366666667
12/4/2012   12/4/2012   1   0.03    0.03
12/11/2012  12/11/2012  1   0.34    0.34
12/16/2012  12/17/2012  2   0.08    0.04
12/21/2012  12/26/2012  6   0.29    0.048333333
1/5/2013    1/5/2013    1   0.47    0.47
1/9/2013    1/9/2013    1   0.03    0.03

Upvotes: 0

Views: 297

Answers (4)

S.C
S.C

Reputation: 740

Only base packages, and basically using aggregate function. I know it is not the nicest option around. The only problem is with the format of dates (the columns of data frame must be specified one-by-one for the desired date format, otherwise it will be converted to integer):

data1 <- structure(list(Time = structure(c(1353398400, 1353484800, 1353571200, 
    1353657600, 1353744000, 1353830400, 1353916800, 1354003200, 1354089600, 
    1354176000, 1354262400, 1354348800, 1354435200, 1354521600, 1354608000, 
    1354694400, 1354780800, 1354867200, 1354953600, 1355040000, 1355126400, 
    1355212800, 1355299200, 1355385600, 1355472000, 1355558400, 1355644800, 
    1355731200, 1355817600, 1355904000, 1355990400, 1356076800, 1356163200, 
    1356249600, 1356336000, 1356422400, 1356508800, 1356595200, 1356681600, 
    1356768000, 1356854400, 1356940800, 1357027200, 1357113600, 1357200000, 
    1357286400, 1357372800, 1357459200, 1357545600, 1357632000, 1357718400
    ), class = c("POSIXct", "POSIXt"), tzone = ""), inc = c(NA, NA, 
    NA, NA, NA, NA, NA, 0.11, NA, 0.62, 0.0899999999999999, 0.39, 
    NA, NA, 0.03, NA, NA, NA, NA, NA, NA, 0.34, NA, NA, NA, NA, 0.0600000000000001, 
    0.02, NA, NA, NA, 0.29, 0.35, 0.02, 0.27, 0.17, 0.0600000000000001, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.47, NA, NA, NA, 0.0300000000000002
    )), .Names = c("Time", "inc"), row.names = 50:100, class = "data.frame")

rainruns <- function(datas = data1) {
    incs <- c(NA, datas$inc) # last column
    event <- cumsum(is.na(incs[-length(incs)]) & !is.na(incs[-1])) # counter for rain events
    datas <- cbind(datas, event) # add events column
    datas2 <- datas[!is.na(datas$inc),] # delete na's
    summarydata1 <- aggregate(datas2$inc, by = list(datas2$event), # summarize rain data by event
                              FUN = function(x) c(length(x), sum(x), mean(x)))[[2]]
    summarydata2 <- aggregate(as.Date(datas2$Time), by = list(datas2$event), # summarize dates by event
                              FUN = function(x) c(min(x), max(x)))[[2]]
    summarydata <- data.frame(format(as.Date(summarydata2[,1], # combine both, correcting date formats
                                             origin = "1970-01-01"), "%m/%d/%Y"),
                              format(as.Date(summarydata2[,2],
                                             origin = "1970-01-01"), "%m/%d/%Y"), summarydata1)
    names(summarydata) <- c("Begin", "End", "Days", "Total", "Intensity") # update column names
    return(summarydata)
}

Upvotes: 1

nestor556
nestor556

Reputation: 455

You can append a new column that group rows when they represent a continuous rainy period, then get the statistics you want using dplyr. assuming that your dataframe is called df:

library(dplyr)
rain_period = rep(NA,nrow(df)) #initialize vector
group=1 #initialize group number
for(i in 1:nrow(df)){
  if(is.na(df$inc[i])) group = group + 1
  else rain_period[i] = group
}
df$group = rain_period


result = dplyr::group_by(df,group)
result = dplyr::summarise(result,
                      Begin = min(Time),
                      End = max(Time),
                      Days = n(),
                      Total = sum(inc),
                      Intensity = mean(inc))

Upvotes: 1

davechilders
davechilders

Reputation: 9133

Here is an approach that uses dplyr.

First, some preliminary cleanup: a date variable is needed, not a POSIXct:

library(dplyr)

df2 <- df %>%
  mutate(date = as.Date(Time)) %>%
  select(-Time)

This computes a data frame with an explicit variable for rain_event:

df3 <- df2 %>%
  filter(!is.na(inc)) %>%
  mutate(
    day_lag = as.numeric(difftime(date, lag(date), units = "days")),
    # special case: first rain event
    day_lag = ifelse(is.na(day_lag), 1, day_lag),
    rain_event = 1 + cumsum(day_lag > 1)
  )

> df3
    inc       date day_lag rain_event
1  0.11 2012-11-27       1          1
2  0.62 2012-11-29       2          2
3  0.09 2012-11-30       1          2
4  0.39 2012-12-01       1          2
5  0.03 2012-12-04       3          3
6  0.34 2012-12-11       7          4
7  0.06 2012-12-16       5          5
8  0.02 2012-12-17       1          5
9  0.29 2012-12-21       4          6
10 0.35 2012-12-22       1          6
11 0.02 2012-12-23       1          6
12 0.27 2012-12-24       1          6
13 0.17 2012-12-25       1          6
14 0.06 2012-12-26       1          6
15 0.47 2013-01-05      10          7
16 0.03 2013-01-09       4          8

Now, summarise by each rain event, computing the metrics you care about:

df3 %>%
  group_by(rain_event) %>%
  summarise(
    begin = min(date),
    end = max(date),
    days = n(),
    total = sum(inc),
    intensity = mean(inc)
  )

   # A tibble: 8 × 6
  rain_event      begin        end  days total intensity
       <dbl>     <date>     <date> <int> <dbl>     <dbl>
1          1 2012-11-27 2012-11-27     1  0.11 0.1100000
2          2 2012-11-29 2012-12-01     3  1.10 0.3666667
3          3 2012-12-04 2012-12-04     1  0.03 0.0300000
4          4 2012-12-11 2012-12-11     1  0.34 0.3400000
5          5 2012-12-16 2012-12-17     2  0.08 0.0400000
6          6 2012-12-21 2012-12-26     6  1.16 0.1933333
7          7 2013-01-05 2013-01-05     1  0.47 0.4700000
8          8 2013-01-09 2013-01-09     1  0.03 0.0300000

Upvotes: 1

akuiper
akuiper

Reputation: 215117

data.table::rleid is a convenient function for dealing with consecutive values, assuming your data frame is named df and it has been sorted by Time variable before hand:

library(data.table)
setDT(df)
na.omit(df[,.(Begin = as.Date(first(Time)), 
              End = as.Date(last(Time)), 
              Days = as.Date(last(Time)) - as.Date(first(Time)) + 1, 
              Total = sum(inc), Intensity = mean(inc)),
          by = .(id = rleid(is.na(inc)))])

#   id      Begin        End   Days Total Intensity
#1:  2 2012-11-27 2012-11-27 1 days  0.11 0.1100000
#2:  4 2012-11-29 2012-12-01 3 days  1.10 0.3666667
#3:  6 2012-12-04 2012-12-04 1 days  0.03 0.0300000
#4:  8 2012-12-11 2012-12-11 1 days  0.34 0.3400000
#5: 10 2012-12-16 2012-12-17 2 days  0.08 0.0400000
#6: 12 2012-12-21 2012-12-26 6 days  1.16 0.1933333 #I think you have some miscalculation here
#7: 14 2013-01-05 2013-01-05 1 days  0.47 0.4700000
#8: 16 2013-01-09 2013-01-09 1 days  0.03 0.0300000

Upvotes: 3

Related Questions