Grouping rows with same values

Question

I am working with extreme dry events and trying to obtain some information about their characteristics. This is an example of my data:

   dat <- data.frame(length= c(39,1,1,1,98,1,1,1,57,1,1,1,34,1,1), value = c(0,-1.111,-1.645,-1.285,0,-1.223,-1.369,-1.007,0,-1.083,-1.675,-1.119,0,-1.554,-1.6228))

Rows are months, thus column 'length' identifies a dry month with the number 1, for its part, column 'value' records the severity of that dry event. What I would like to obtain is, on one hand the median and maximum length of dry events but considering each group of consecutive dry months (length = 1) as an event (red boxes); on the other hand I would like to calculate the median and minimum values of the severity of all the dry events in the serie.

This screenshot shows what I am trying to get and the values I expect to obtain.

My main question is how I can considerate the groups of consecutive rows with value 1 in column 'length' as an unique case and calculate these simple statistics.

Thank you so much in advance for any help provided.

akrun · Accepted Answer

One option would be to create a grouping variable with run-length-id (rleid) and then use that to summarise the 'value' for median, min and other statistics of interest (i.e. number of rows - n())

library(dplyr)
library(data.table)
dat %>% 
    group_by(grp = rleid(length == 1)) %>% 
    filter(length == 1) %>% 
    summarise(Length = n(), Median = median(value), Min = min(value))

Or similar way with data.table by first creating a grouping variable with rleid, grouped by the 'grp' and specifying the i with the logical expression to subset the rows that are only equal to 1 in 'length',get the median and min (or max) in 'value' column

library(data.table)
setDT(dat)[, grp := rleid(length==1)][length == 1, 
   .(Length = .N, Median = median(value), Min = min(value)), .(grp)]

Grouping rows with same values

Answers (2)

Related Questions