Cezary
Cezary

Reputation: 25

Numbering rows within groups with data gaps in R

so I have similar problem like one that was described here: Numbering rows within groups in a data frame

I want to give and id to rows within groups again, but the main difference is that I have a date field, which sometimes has gaps in it. A gap would also indicate, that we have a new group.

Here is my code and ​what I've achieved:

data <- data %>%
  arrange(group1, group2, day) %>%
  group_by(group1, group2) %>%
  mutate(id_day = row_number())
       day  group1  group2  id_day
2020-05-01       A       B       1
2020-05-02       A       B       2
2020-05-03       A       B       3
2020-05-04       A       B       4
2020-05-07       A       B       5
2020-05-08       A       B       6
2020-05-09       A       B       7
2020-06-05       C       D       1
2020-06-06       C       D       2
2020-06-07       C       D       3
2020-06-08       C       D       4
2020-06-09       C       D       5
2020-06-10       C       D       6
2020-06-11       C       D       7

And this is what I want to achieve:

       day  group1  group2  id_day
2020-05-01       A       B       1
2020-05-02       A       B       2
2020-05-03       A       B       3
2020-05-04       A       B       4
2020-05-07       A       B       1
2020-05-08       A       B       2
2020-05-09       A       B       3
2020-06-05       C       D       1
2020-06-06       C       D       2
2020-06-07       C       D       3
2020-06-08       C       D       4
2020-06-09       C       D       5
2020-06-10       C       D       6
2020-06-11       C       D       7

Any ideas how to do that?

Thanks,

Upvotes: 0

Views: 75

Answers (2)

GKi
GKi

Reputation: 39647

In case it is ordered you can use cumsum of the diff of day to get a new grouping.

with(data, ave(id_day, group1, group2,
  c(0, cumsum(diff(day) > 1)), FUN = seq_along))
# [1] 1 2 3 4 1 2 3 1 2 3 4 5 6 7

Assuming that column day is as.Date.

In case it is not a Date:

data$day <- as.Date(data$day)

In case it is not ordered:

data <- data[do.call(order, data[c("group1","group2","day")]),]

Upvotes: 4

Ronak Shah
Ronak Shah

Reputation: 388817

You can add another value in group_by which would change the value when the difference of current date and previous date is greater than 1.

library(dplyr)

data %>%
  mutate(day = as.Date(day)) %>%
  group_by(group1, group2,
           date_gap = cumsum(day - lag(day, default = first(day)) > 1)) %>%
  mutate(id_day = row_number()) %>%
  ungroup %>%
  select(-date_gap)

#    day        group1 group2 id_day
#   <date>     <chr>  <chr>   <int>
# 1 2020-05-01 A      B           1
# 2 2020-05-02 A      B           2
# 3 2020-05-03 A      B           3
# 4 2020-05-04 A      B           4
# 5 2020-05-07 A      B           1
# 6 2020-05-08 A      B           2
# 7 2020-05-09 A      B           3
# 8 2020-06-05 C      D           1
# 9 2020-06-06 C      D           2
#10 2020-06-07 C      D           3
#11 2020-06-08 C      D           4
#12 2020-06-09 C      D           5
#13 2020-06-10 C      D           6
#14 2020-06-11 C      D           7

data

data <- structure(list(day = c("2020-05-01", "2020-05-02", "2020-05-03", 
"2020-05-04", "2020-05-07", "2020-05-08", "2020-05-09", "2020-06-05", 
"2020-06-06", "2020-06-07", "2020-06-08", "2020-06-09", "2020-06-10", 
"2020-06-11"), group1 = c("A", "A", "A", "A", "A", "A", "A", 
"C", "C", "C", "C", "C", "C", "C"), group2 = c("B", "B", "B", 
"B", "B", "B", "B", "D", "D", "D", "D", "D", "D", "D"), id_day = c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L)), 
class = "data.frame", row.names = c(NA, -14L))

Upvotes: 1

Related Questions