Reputation: 401
I am trying to aggregate date by type through time, accounting for the changes occurred during the whole time period for the local min_date and max_date for each interval of time.
id type date
1 A 2019-01-01
1 A 2019-01-02
1 B 2019-01-03
1 B 2019-01-04
2 A 2019-01-01
2 A 2019-01-02
2 B 2019-01-03
2 A 2019-01-04
Code:
df <- data.frame(id = c(1, 1, 1, 1, 2, 2, 2, 2), type = c("A", "A", "B", "B", "A", "A", "B", "C"), date = as.Date(c("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04", "2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"), "%Y-%m-%d"))
The result that I am trying to get:
id type min_date max_date
1 A 2019-01-01 2019-01-02
1 B 2019-01-03 2019-01-04
2 A 2019-01-01 2019-01-02
2 B 2019-01-03 2019-01-03
2 A 2019-01-04 2019-01-04
I tired the following code, but it only shows the global min_date and max_date - not accounting for the changes through time:
library(dplyr)
df_changes <- df %>% group_by(id, type) %>% summarise(listings = n(), min_date = min(date), max_date = max(date))
Upvotes: 2
Views: 69
Reputation: 66425
Another alternative, just counting changes in Type so we distinguish between each time a type returns.
df %>%
# New code: count how many times type is different from the prior line
mutate(new_period = cumsum(type != lag(type,default = ""))) %>%
group_by(id, type, new_period) %>%
# Orig code
summarise(listings = n(), min_date = min(date), max_date = max(date))
Upvotes: 0
Reputation: 887048
Here is an option using data.table
library(data.table)
setDT(df)[, as.list(range(date)) , .(id, type, grp = rleid(type))][, grp := NULL][]
# id type V1 V2
#1: 1 A 2019-01-01 2019-01-02
#2: 1 B 2019-01-03 2019-01-04
#3: 2 A 2019-01-01 2019-01-02
#4: 2 B 2019-01-03 2019-01-03
#5: 2 C 2019-01-04 2019-01-04
Upvotes: 1
Reputation: 39858
You can do:
df %>%
group_by(rleid = with(rle(type), rep(seq_along(lengths), lengths))) %>%
summarise(min_date = min(date),
max_data = max(date),
id = first(id),
type = first(type)) %>%
ungroup() %>%
select(-rleid)
min_date max_data id type
<date> <date> <dbl> <chr>
1 2019-01-01 2019-01-02 1 A
2 2019-01-03 2019-01-04 1 B
3 2019-01-01 2019-01-02 2 A
4 2019-01-03 2019-01-03 2 B
5 2019-01-04 2019-01-04 2 C
Or:
df %>%
group_by(rleid = with(rle(type), rep(seq_along(lengths), lengths)), id, type) %>%
summarise(min_date = min(date),
max_data = max(date)) %>%
ungroup() %>%
select(-rleid)
Sample data:
df <- data.frame(id = c(1, 1, 1, 1, 2, 2, 2, 2), type = c("A", "A", "B", "B", "A", "A", "B", "C"), date = as.Date(c("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04", "2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"), "%Y-%m-%d"),
stringsAsFactors = FALSE)
Upvotes: 2