H.Z
H.Z

Reputation: 79

Count number in group, but restart for non-consecutive dates

I have data that looks like this:

sample <- data.frame(
  group = c("A","A","A","B","B","B"),
  date = c(as.Date("2014-12-31"),
           as.Date("2015-01-31"),
           as.Date("2015-02-28"),
           as.Date("2015-01-31"),
           as.Date("2015-03-31"),
           as.Date("2015-04-30")),
  obs = c(100, 200, 300, 50, 100, 150)
)

Note that the date variable always takes the last date of the month. In table format, the data looks like this:

  group       date obs
1     A 2014-12-31 100
2     A 2015-01-31 200
3     A 2015-02-28 300
4     B 2015-01-31  50
5     B 2015-03-31 100
6     B 2015-04-30 150

I want to create a forth column that counts the number of observations in the group. HOWEVER, I want the count to start over if a month doesn't immediately follow the month before. This is what I want it to look like:

  group       date obs num
1     A 2014-12-31 100   1
2     A 2015-01-31 200   2
3     A 2015-02-28 300   3
4     B 2015-01-31  50   1
5     B 2015-03-31 100   1
6     B 2015-04-30 150   2

So far all I can get is the following:

library(tidyverse)
sample <- sample %>%
  arrange(date) %>%
  group_by(group) %>%
  mutate(num = row_number())

  group       date obs num
1     A 2014-12-31 100   1
2     A 2015-01-31 200   2
3     A 2015-02-28 300   3
4     B 2015-01-31  50   1
5     B 2015-03-31 100   2
6     B 2015-04-30 150   3

Any help would be much appreciated. I also want to be able to do the same thing but with quarterly data (instead of monthly).

Upvotes: 0

Views: 195

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388907

We can use lubridate::days_in_month to get number of days in a month compare it with difference of current and past date to create a new group. We can then assign row_number() in each group.

library(dplyr)

sample %>%
  group_by(group) %>%
  mutate(diff_days = cumsum(as.numeric(date - lag(date, default = first(date))) != 
                     lubridate::days_in_month(date))) %>%
  group_by(diff_days, add = TRUE) %>%
  mutate(num = row_number()) %>%
  ungroup() %>%
  select(-diff_days)


# group  date         obs   num
#  <fct> <date>     <dbl> <int>
#1 A     2014-12-31   100     1
#2 A     2015-01-31   200     2
#3 A     2015-02-28   300     3
#4 B     2015-01-31    50     1
#5 B     2015-03-31   100     1
#6 B     2015-04-30   150     2

Upvotes: 1

akrun
akrun

Reputation: 887048

We can create a group based on the differnece of month of 'date' and if it is not equal to 1 i.e. one month difference

library(dplyr)
library(lubridate)
sample %>%
     arrange(group, date) %>% 
     group_by(group, mth = cumsum(c(TRUE, diff(month(date)) != 1))) %>% 
     mutate(num = row_number()) %>%
     ungroup %>%
     select(-mth)
# A tibble: 6 x 4
#  group date         obs   num
#  <fct> <date>     <dbl> <int>
#1 A     2015-01-31   100     1
#2 A     2015-02-28   200     2
#3 A     2015-03-31   300     3
#4 B     2015-01-31    50     1
#5 B     2015-03-31   100     1
#6 B     2015-04-30   150     2

If the year also needs to be considered

library(zoo)
sample %>%
    arrange(group, date) %>% 
    mutate(yearmon = as.yearmon(date)) %>% 
    group_by(group) %>%
    group_by(grp = cumsum(c(TRUE, as.integer(diff(yearmon) * 12)> 1)),
             add = TRUE ) %>% 
    mutate(num = row_number()) %>% 
    ungroup %>% 
    select(-grp, -yearmon)
# A tibble: 6 x 4
#  group date         obs   num
#  <fct> <date>     <dbl> <int>
#1 A     2015-01-31   100     1
#2 A     2015-02-28   200     2
#3 A     2015-03-31   300     3
#4 B     2015-01-31    50     1
#5 B     2015-03-31   100     1
#6 B     2015-04-30   150     2

Upvotes: 0

Related Questions