Lev
Lev

Reputation: 833

One column date into two column dates

I have the following input

id  date
GAB 2018-02-06
GAB 2018-02-08
GEW 2018-02-09
GEW 2018-02-10
GEW 2018-02-13
GAB 2018-02-14
GAB 2018-02-17
EWP 2018-02-25
EWP 2018-02-26
EWP 2018-02-27

I would like the following output.

id  start       end
GAB 2018-02-06  2018-02-08
GEW 2018-02-09  2018-02-13
GAB 2018-02-14  2018-02-17
EWP 2018-02-25  2018-02-27

Is there any easy way to do it?

Upvotes: 1

Views: 46

Answers (2)

akrun
akrun

Reputation: 887851

We can group by rleid of 'id' along with 'id', and get the min and max of 'date' in summarise

library(dplyr)
library(data.table)
df1 %>% 
    group_by(id, grp = rleid(id)) %>%
    # // or do the cumulative sum of a logical vector from 
    # // comparing with the previous adjacent value
    # group_by(grp = cumsum(id != lag(id, default = first(id))), id) %>%
    summarise(start = min(date), end = max(date), .groups = 'drop') %>%
    select(-grp)

-output

# A tibble: 4 x 3
#  id    start      end       
#  <chr> <date>     <date>    
#1 EWP   2018-02-25 2018-02-27
#2 GAB   2018-02-06 2018-02-08
#3 GAB   2018-02-14 2018-02-17
#4 GEW   2018-02-09 2018-02-13

The comment below is incorrect as it will remove the 'id' column from the output.

data

df1 <- structure(list(id = c("GAB", "GAB", "GEW", "GEW", "GEW", "GAB", 
"GAB", "EWP", "EWP", "EWP"), date = structure(c(17568, 17570, 
17571, 17572, 17575, 17576, 17579, 17587, 17588, 17589), class = "Date")), 
row.names = c(NA, 
-10L), class = "data.frame")

Upvotes: 4

ThomasIsCoding
ThomasIsCoding

Reputation: 102625

A data.table option

setDT(df)[,.(id = unique(id), start = min(date), end = max(date)),rleid(id)][,-1]

gives

    id      start        end
1: GAB 2018-02-06 2018-02-08
2: GEW 2018-02-09 2018-02-13
3: GAB 2018-02-14 2018-02-17
4: EWP 2018-02-25 2018-02-27

Upvotes: 2

Related Questions