Reputation: 143
I am working with extreme dry events and trying to obtain some information about their characteristics. This is an example of my data:
dat <- data.frame(length= c(39,1,1,1,98,1,1,1,57,1,1,1,34,1,1), value = c(0,-1.111,-1.645,-1.285,0,-1.223,-1.369,-1.007,0,-1.083,-1.675,-1.119,0,-1.554,-1.6228))
Rows are months, thus column 'length' identifies a dry month with the number 1, for its part, column 'value' records the severity of that dry event. What I would like to obtain is, on one hand the median and maximum length of dry events but considering each group of consecutive dry months (length = 1) as an event (red boxes); on the other hand I would like to calculate the median and minimum values of the severity of all the dry events in the serie.
This screenshot shows what I am trying to get and the values I expect to obtain.
My main question is how I can considerate the groups of consecutive rows with value 1 in column 'length' as an unique case and calculate these simple statistics.
Thank you so much in advance for any help provided.
Upvotes: 3
Views: 115
Reputation: 1702
A fully dplyr
solution:
library(tidyverse)
dat2 <- dat %>%
mutate(zero_count = if_else(value == 0, 1, 0),
group_id = cumsum(zero_count)) %>%
filter(length == 1) %>%
group_by(group_id) %>%
summarise(mean = mean(value), median = median(value), months = length(group_id))
Firstly I added a new column to flag if there's a zero-value. Then a new column which we cumulatively sum so we can identify each group sequentially / individually. Then group by this new column and filter out the other rows that don't belong. Finally summarise to get the stats.
group_id mean median months
<dbl> <dbl> <dbl> <int>
1 1 -1.35 -1.28 3
2 2 -1.20 -1.22 3
3 3 -1.29 -1.12 3
4 4 -1.59 -1.59 2
Upvotes: 1
Reputation: 887881
One option would be to create a grouping variable with run-length-id (rleid
) and then use that to summarise
the 'value' for median
, min
and other statistics of interest (i.e. number of rows - n()
)
library(dplyr)
library(data.table)
dat %>%
group_by(grp = rleid(length == 1)) %>%
filter(length == 1) %>%
summarise(Length = n(), Median = median(value), Min = min(value))
Or similar way with data.table
by first creating a grouping variable with rleid
, grouped by the 'grp' and specifying the i
with the logical expression to subset the rows that are only equal to 1 in 'length',get the median
and min
(or max
) in 'value' column
library(data.table)
setDT(dat)[, grp := rleid(length==1)][length == 1,
.(Length = .N, Median = median(value), Min = min(value)), .(grp)]
Upvotes: 1