roboes
roboes

Reputation: 401

Aggregate IDs given type min_date and max_date change through time

I am trying to aggregate date by type through time, accounting for the changes occurred during the whole time period for the local min_date and max_date for each interval of time.

id  type    date
1   A       2019-01-01
1   A       2019-01-02
1   B       2019-01-03
1   B       2019-01-04
2   A       2019-01-01
2   A       2019-01-02
2   B       2019-01-03
2   A       2019-01-04

Code:

df <- data.frame(id = c(1, 1, 1, 1, 2, 2, 2, 2), type = c("A", "A", "B", "B", "A", "A", "B", "C"), date = as.Date(c("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04", "2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"), "%Y-%m-%d"))

The result that I am trying to get:

id  type    min_date    max_date
1   A       2019-01-01  2019-01-02
1   B       2019-01-03  2019-01-04
2   A       2019-01-01  2019-01-02
2   B       2019-01-03  2019-01-03
2   A       2019-01-04  2019-01-04

I tired the following code, but it only shows the global min_date and max_date - not accounting for the changes through time:

library(dplyr)
df_changes <- df %>% group_by(id, type) %>% summarise(listings = n(), min_date = min(date), max_date = max(date))

Upvotes: 2

Views: 69

Answers (3)

Jon Spring
Jon Spring

Reputation: 66425

Another alternative, just counting changes in Type so we distinguish between each time a type returns.

df %>%
  # New code: count how many times type is different from the prior line
  mutate(new_period = cumsum(type != lag(type,default = ""))) %>%
  group_by(id, type, new_period) %>% 

  # Orig code
  summarise(listings = n(), min_date = min(date), max_date = max(date))

Upvotes: 0

akrun
akrun

Reputation: 887048

Here is an option using data.table

library(data.table)
setDT(df)[, as.list(range(date)) , .(id, type, grp = rleid(type))][, grp := NULL][]
#  id type         V1         V2
#1:  1    A 2019-01-01 2019-01-02
#2:  1    B 2019-01-03 2019-01-04
#3:  2    A 2019-01-01 2019-01-02
#4:  2    B 2019-01-03 2019-01-03
#5:  2    C 2019-01-04 2019-01-04

Upvotes: 1

tmfmnk
tmfmnk

Reputation: 39858

You can do:

df %>%
 group_by(rleid = with(rle(type), rep(seq_along(lengths), lengths))) %>%
 summarise(min_date = min(date),
           max_data = max(date),
           id = first(id),
           type = first(type)) %>%
 ungroup() %>%
 select(-rleid)

  min_date   max_data      id type 
  <date>     <date>     <dbl> <chr>
1 2019-01-01 2019-01-02     1 A    
2 2019-01-03 2019-01-04     1 B    
3 2019-01-01 2019-01-02     2 A    
4 2019-01-03 2019-01-03     2 B    
5 2019-01-04 2019-01-04     2 C 

Or:

df %>%
 group_by(rleid = with(rle(type), rep(seq_along(lengths), lengths)), id, type) %>%
 summarise(min_date = min(date),
           max_data = max(date)) %>%
 ungroup() %>%
 select(-rleid)

Sample data:

df <- data.frame(id = c(1, 1, 1, 1, 2, 2, 2, 2), type = c("A", "A", "B", "B", "A", "A", "B", "C"), date = as.Date(c("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04", "2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"), "%Y-%m-%d"),
                  stringsAsFactors = FALSE)

Upvotes: 2

Related Questions