chipsin
chipsin

Reputation: 675

Summarizing groups where the intervals are adjacent

I have a data frame which has interval data (From and To) with two grouping variables (ID and Grouping). I am trying to summarize the data into groups which have from the same ID and Grouping, and where the intervals are adjacent.

Below is an example data frame. I know the data can be grouped using the group_by function in dplyr and can be summarized using the summarize function in dplyr (in this example summarizing using the sum of each group), but unsure how to ensure dplyr only joins groups where the intervals are adjacent.

ID <- c('ID1','ID1','ID1','ID1','ID1','ID1','ID2','ID2','ID2','ID2')
grouping <- c(1,1,1,1,1,1,1,1,1,1)
From <- c(1,5,6,20,40,41,45,52,60,61)
To <- c(5,6,10,25,41,45,52,53,61,62) 
Value <- c(1,2,1,2,2,3,1,1,1,4)
df <- data.frame(ID, grouping, From, To, Value)

    ID grouping From To Value
1  ID1        1    1  5     1
2  ID1        1    5  6     2
3  ID1        1    6 10     1
4  ID1        1   20 25     2
5  ID1        1   40 41     2
6  ID1        1   41 45     3
7  ID2        1   45 52     1
8  ID2        1   52 53     1
9  ID2        1   60 61     1
10 ID2        1   61 62     4

Below is the desired outcome

   ID grouping From To Value
1 ID1        1    1 10     4
2 ID1        1   20 25     2
3 ID1        1   40 45     5
4 ID2        1   45 53     2
5 ID2        1   60 62     5

Upvotes: 1

Views: 116

Answers (1)

Darren Tsai
Darren Tsai

Reputation: 35554

Use From != lag(To) and cumsum to generate each interval and summarize the data.

library(dplyr)

df %>%
  group_by(ID, grouping, interval = cumsum(From != lag(To, default = From[1]))) %>%
  summarise(From = min(From), To = max(To), Value = sum(Value)) %>%
  ungroup %>%
  select(-interval)

# # A tibble: 5 x 5
#   ID    grouping  From    To Value
#   <fct>    <dbl> <dbl> <dbl> <dbl>
# 1 ID1          1     1    10     4
# 2 ID1          1    20    25     2
# 3 ID1          1    40    45     5
# 4 ID2          1    45    53     2
# 5 ID2          1    60    62     5

Upvotes: 3

Related Questions