Reputation: 25
so I have similar problem like one that was described here: Numbering rows within groups in a data frame
I want to give and id to rows within groups again, but the main difference is that I have a date field, which sometimes has gaps in it. A gap would also indicate, that we have a new group.
Here is my code and what I've achieved:
data <- data %>%
arrange(group1, group2, day) %>%
group_by(group1, group2) %>%
mutate(id_day = row_number())
day group1 group2 id_day
2020-05-01 A B 1
2020-05-02 A B 2
2020-05-03 A B 3
2020-05-04 A B 4
2020-05-07 A B 5
2020-05-08 A B 6
2020-05-09 A B 7
2020-06-05 C D 1
2020-06-06 C D 2
2020-06-07 C D 3
2020-06-08 C D 4
2020-06-09 C D 5
2020-06-10 C D 6
2020-06-11 C D 7
And this is what I want to achieve:
day group1 group2 id_day
2020-05-01 A B 1
2020-05-02 A B 2
2020-05-03 A B 3
2020-05-04 A B 4
2020-05-07 A B 1
2020-05-08 A B 2
2020-05-09 A B 3
2020-06-05 C D 1
2020-06-06 C D 2
2020-06-07 C D 3
2020-06-08 C D 4
2020-06-09 C D 5
2020-06-10 C D 6
2020-06-11 C D 7
Any ideas how to do that?
Thanks,
Upvotes: 0
Views: 75
Reputation: 39647
In case it is ordered you can use cumsum
of the diff
of day to get a new grouping.
with(data, ave(id_day, group1, group2,
c(0, cumsum(diff(day) > 1)), FUN = seq_along))
# [1] 1 2 3 4 1 2 3 1 2 3 4 5 6 7
Assuming that column day is as.Date
.
In case it is not a Date:
data$day <- as.Date(data$day)
In case it is not ordered:
data <- data[do.call(order, data[c("group1","group2","day")]),]
Upvotes: 4
Reputation: 388817
You can add another value in group_by
which would change the value when the difference of current date and previous date is greater than 1.
library(dplyr)
data %>%
mutate(day = as.Date(day)) %>%
group_by(group1, group2,
date_gap = cumsum(day - lag(day, default = first(day)) > 1)) %>%
mutate(id_day = row_number()) %>%
ungroup %>%
select(-date_gap)
# day group1 group2 id_day
# <date> <chr> <chr> <int>
# 1 2020-05-01 A B 1
# 2 2020-05-02 A B 2
# 3 2020-05-03 A B 3
# 4 2020-05-04 A B 4
# 5 2020-05-07 A B 1
# 6 2020-05-08 A B 2
# 7 2020-05-09 A B 3
# 8 2020-06-05 C D 1
# 9 2020-06-06 C D 2
#10 2020-06-07 C D 3
#11 2020-06-08 C D 4
#12 2020-06-09 C D 5
#13 2020-06-10 C D 6
#14 2020-06-11 C D 7
data
data <- structure(list(day = c("2020-05-01", "2020-05-02", "2020-05-03",
"2020-05-04", "2020-05-07", "2020-05-08", "2020-05-09", "2020-06-05",
"2020-06-06", "2020-06-07", "2020-06-08", "2020-06-09", "2020-06-10",
"2020-06-11"), group1 = c("A", "A", "A", "A", "A", "A", "A",
"C", "C", "C", "C", "C", "C", "C"), group2 = c("B", "B", "B",
"B", "B", "B", "B", "D", "D", "D", "D", "D", "D", "D"), id_day = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L)),
class = "data.frame", row.names = c(NA, -14L))
Upvotes: 1