Reputation: 18561
I have a data.frame
of (sub)string positions within a larger string. The data contains the start of a (sub)string and it's length. The end position of the (sub)string can be easily calculated.
data1 <- data.frame(start = c(1,3,4,9,10,13),
length = c(2,1,3,1,2,1)
)
data1$end <- (data1$start + data1$length - 1)
data1
#> start length end
#> 1 1 2 2
#> 2 3 1 3
#> 3 4 3 6
#> 4 9 1 9
#> 5 10 2 11
#> 6 13 1 13
Created on 2019-12-10 by the reprex package (v0.3.0)
I would like to 'compress' this data.frame
by summarizing continuous (sub)strings (strings that are connected with each other) so that my new data looks like this:
data2 <- data.frame(start = c(1,9,13),
length = c(6,3,1)
)
data2$end <- (data2$start + data2$length - 1)
data2
#> start length end
#> 1 1 6 6
#> 2 9 3 11
#> 3 13 1 13
Created on 2019-12-10 by the reprex package (v0.3.0)
Is there preferably a base R solution which gets me from data1
to data2
?
Upvotes: 3
Views: 160
Reputation: 32548
f = cumsum(with(data1, c(0, start[-1] - head(end, -1))) != 1)
do.call(rbind, lapply(split(data1, f), function(x){
with(x, data.frame(start = start[1],
length = tail(end, 1) - start[1] + 1,
end = tail(end, 1)))}))
# start length end
#1 1 6 6
#2 9 3 11
#3 13 1 13
Upvotes: 2
Reputation: 28955
Using dplyr
we can do the following:
library(dplyr)
data1 %>%
group_by(consecutive = cumsum(start != lag(end, default = 0) + 1)) %>%
summarise(start = min(start), length=sum(length), end=max(end)) %>%
ungroup %>% select(-consecutive)
#> # A tibble: 3 x 3
#> start length end
#> <dbl> <dbl> <dbl>
#> 1 1 6 6
#> 2 9 3 11
#> 3 13 1 13
Upvotes: 2