Reputation: 675
I have a data frame which has interval data (From and To) with two grouping variables (ID and Grouping). I am trying to summarize the data into groups which have from the same ID and Grouping, and where the intervals are adjacent.
Below is an example data frame. I know the data can be grouped using the group_by function in dplyr and can be summarized using the summarize function in dplyr (in this example summarizing using the sum of each group), but unsure how to ensure dplyr only joins groups where the intervals are adjacent.
ID <- c('ID1','ID1','ID1','ID1','ID1','ID1','ID2','ID2','ID2','ID2')
grouping <- c(1,1,1,1,1,1,1,1,1,1)
From <- c(1,5,6,20,40,41,45,52,60,61)
To <- c(5,6,10,25,41,45,52,53,61,62)
Value <- c(1,2,1,2,2,3,1,1,1,4)
df <- data.frame(ID, grouping, From, To, Value)
ID grouping From To Value
1 ID1 1 1 5 1
2 ID1 1 5 6 2
3 ID1 1 6 10 1
4 ID1 1 20 25 2
5 ID1 1 40 41 2
6 ID1 1 41 45 3
7 ID2 1 45 52 1
8 ID2 1 52 53 1
9 ID2 1 60 61 1
10 ID2 1 61 62 4
Below is the desired outcome
ID grouping From To Value
1 ID1 1 1 10 4
2 ID1 1 20 25 2
3 ID1 1 40 45 5
4 ID2 1 45 53 2
5 ID2 1 60 62 5
Upvotes: 1
Views: 116
Reputation: 35554
Use From != lag(To)
and cumsum
to generate each interval and summarize the data.
library(dplyr)
df %>%
group_by(ID, grouping, interval = cumsum(From != lag(To, default = From[1]))) %>%
summarise(From = min(From), To = max(To), Value = sum(Value)) %>%
ungroup %>%
select(-interval)
# # A tibble: 5 x 5
# ID grouping From To Value
# <fct> <dbl> <dbl> <dbl> <dbl>
# 1 ID1 1 1 10 4
# 2 ID1 1 20 25 2
# 3 ID1 1 40 45 5
# 4 ID2 1 45 53 2
# 5 ID2 1 60 62 5
Upvotes: 3