Reputation: 79
I have data that looks like this:
sample <- data.frame(
group = c("A","A","A","B","B","B"),
date = c(as.Date("2014-12-31"),
as.Date("2015-01-31"),
as.Date("2015-02-28"),
as.Date("2015-01-31"),
as.Date("2015-03-31"),
as.Date("2015-04-30")),
obs = c(100, 200, 300, 50, 100, 150)
)
Note that the date variable always takes the last date of the month. In table format, the data looks like this:
group date obs
1 A 2014-12-31 100
2 A 2015-01-31 200
3 A 2015-02-28 300
4 B 2015-01-31 50
5 B 2015-03-31 100
6 B 2015-04-30 150
I want to create a forth column that counts the number of observations in the group. HOWEVER, I want the count to start over if a month doesn't immediately follow the month before. This is what I want it to look like:
group date obs num
1 A 2014-12-31 100 1
2 A 2015-01-31 200 2
3 A 2015-02-28 300 3
4 B 2015-01-31 50 1
5 B 2015-03-31 100 1
6 B 2015-04-30 150 2
So far all I can get is the following:
library(tidyverse)
sample <- sample %>%
arrange(date) %>%
group_by(group) %>%
mutate(num = row_number())
group date obs num
1 A 2014-12-31 100 1
2 A 2015-01-31 200 2
3 A 2015-02-28 300 3
4 B 2015-01-31 50 1
5 B 2015-03-31 100 2
6 B 2015-04-30 150 3
Any help would be much appreciated. I also want to be able to do the same thing but with quarterly data (instead of monthly).
Upvotes: 0
Views: 195
Reputation: 388907
We can use lubridate::days_in_month
to get number of days in a month compare it with difference of current and past date
to create a new group. We can then assign row_number()
in each group.
library(dplyr)
sample %>%
group_by(group) %>%
mutate(diff_days = cumsum(as.numeric(date - lag(date, default = first(date))) !=
lubridate::days_in_month(date))) %>%
group_by(diff_days, add = TRUE) %>%
mutate(num = row_number()) %>%
ungroup() %>%
select(-diff_days)
# group date obs num
# <fct> <date> <dbl> <int>
#1 A 2014-12-31 100 1
#2 A 2015-01-31 200 2
#3 A 2015-02-28 300 3
#4 B 2015-01-31 50 1
#5 B 2015-03-31 100 1
#6 B 2015-04-30 150 2
Upvotes: 1
Reputation: 887048
We can create a group based on the diff
ernece of month
of 'date' and if it is not equal to 1 i.e. one month difference
library(dplyr)
library(lubridate)
sample %>%
arrange(group, date) %>%
group_by(group, mth = cumsum(c(TRUE, diff(month(date)) != 1))) %>%
mutate(num = row_number()) %>%
ungroup %>%
select(-mth)
# A tibble: 6 x 4
# group date obs num
# <fct> <date> <dbl> <int>
#1 A 2015-01-31 100 1
#2 A 2015-02-28 200 2
#3 A 2015-03-31 300 3
#4 B 2015-01-31 50 1
#5 B 2015-03-31 100 1
#6 B 2015-04-30 150 2
If the year also needs to be considered
library(zoo)
sample %>%
arrange(group, date) %>%
mutate(yearmon = as.yearmon(date)) %>%
group_by(group) %>%
group_by(grp = cumsum(c(TRUE, as.integer(diff(yearmon) * 12)> 1)),
add = TRUE ) %>%
mutate(num = row_number()) %>%
ungroup %>%
select(-grp, -yearmon)
# A tibble: 6 x 4
# group date obs num
# <fct> <date> <dbl> <int>
#1 A 2015-01-31 100 1
#2 A 2015-02-28 200 2
#3 A 2015-03-31 300 3
#4 B 2015-01-31 50 1
#5 B 2015-03-31 100 1
#6 B 2015-04-30 150 2
Upvotes: 0