Reputation: 833
I have the following input
id date
GAB 2018-02-06
GAB 2018-02-08
GEW 2018-02-09
GEW 2018-02-10
GEW 2018-02-13
GAB 2018-02-14
GAB 2018-02-17
EWP 2018-02-25
EWP 2018-02-26
EWP 2018-02-27
I would like the following output.
id start end
GAB 2018-02-06 2018-02-08
GEW 2018-02-09 2018-02-13
GAB 2018-02-14 2018-02-17
EWP 2018-02-25 2018-02-27
Is there any easy way to do it?
Upvotes: 1
Views: 46
Reputation: 887851
We can group by rleid
of 'id' along with 'id', and get the min
and max
of 'date' in summarise
library(dplyr)
library(data.table)
df1 %>%
group_by(id, grp = rleid(id)) %>%
# // or do the cumulative sum of a logical vector from
# // comparing with the previous adjacent value
# group_by(grp = cumsum(id != lag(id, default = first(id))), id) %>%
summarise(start = min(date), end = max(date), .groups = 'drop') %>%
select(-grp)
-output
# A tibble: 4 x 3
# id start end
# <chr> <date> <date>
#1 EWP 2018-02-25 2018-02-27
#2 GAB 2018-02-06 2018-02-08
#3 GAB 2018-02-14 2018-02-17
#4 GEW 2018-02-09 2018-02-13
The comment below is incorrect as it will remove the 'id' column from the output.
df1 <- structure(list(id = c("GAB", "GAB", "GEW", "GEW", "GEW", "GAB",
"GAB", "EWP", "EWP", "EWP"), date = structure(c(17568, 17570,
17571, 17572, 17575, 17576, 17579, 17587, 17588, 17589), class = "Date")),
row.names = c(NA,
-10L), class = "data.frame")
Upvotes: 4
Reputation: 102625
A data.table
option
setDT(df)[,.(id = unique(id), start = min(date), end = max(date)),rleid(id)][,-1]
gives
id start end
1: GAB 2018-02-06 2018-02-08
2: GEW 2018-02-09 2018-02-13
3: GAB 2018-02-14 2018-02-17
4: EWP 2018-02-25 2018-02-27
Upvotes: 2