Reputation: 13
I have a df with three columns: StartDate, EndDate and SubscriptionType.
df = data.frame(StartDate = as.Date(c('2018-05-01', '2018-06-01', '2018-01-01', '2018-07-01', '2018-03-01')),
EndDate = as.Date(c('2019-04-30', '2019-05-31', '2018-12-31', '2019-06-30', '2019-02-28')),
SubscriptionType = c('monthly', 'monthly', 'yearly', 'yearly', 'yearly'))
As example, by using this code:
df %>% filter(StartDate <= ymd('2018-5-15') & EndDate >= ymd('2018-5-15')) %>% count()
I get how many Subscriptions are active in the month of May.
I would like to print the results for every month in 2018 and store it in a data frame.
So far, I tried using this code:
z = NULL
m = c(01,02,03,04,05,06,07,08,09,10,11,12)
for (i in m) {z = rbind(z, data.frame(df %>% filter(StartDate <= ymd('2018-i-15') & EndDate >= ymd('2018-i-15')) %>% count()))}
but what I get are 24 warnings and a data frame z populated only with zeros.
Any help will be appreciated, thanks!
Upvotes: 1
Views: 1527
Reputation: 354
Improving @akrun's answer, that provided the simplest and direct solution to the looping problem.
However, if you expect to get TRUE
for any month overlapping the interval between two dates it'd be better to convert them to months instead of using some fixed day. You can test it for some row containing '2018-05-16'
and '2018-06-14'
(start and end dates, respectively), it wouldn't count for '2018-05'
neither for '2018-06'
with the original condition.
f <- '%Y-%m' # it'll be used a few times
m <- seq(ymd(180101), ymd(191231), by='month')
m <- format(m, f) # 'e.g. 2018-05'
# purrr::map_dfr() works like apply, it'll map each value from m into .x
# inside the expression and use dplyr::bind_rows() to concatenate the
# resulting rows (see documentation for details).
map_df(m, ~ df %>%
filter(format(StartDate, f) <= .x &
format(EndDate, f) >= .x) %>%
group_by(month = .x) %>%
count()
)
# For the data.frame you provided this is the result:
## A tibble: 18 x 2
## Groups: month [18]
# month n
# <chr> <int>
# 1 2018-01 1
# 2 2018-02 1
# 3 2018-03 2
# 4 2018-04 2
# 5 2018-05 3
# 6 2018-06 4
# 7 2018-07 5
# 8 2018-08 5
# 9 2018-09 5
#10 2018-10 5
#11 2018-11 5
#12 2018-12 5
#13 2019-01 4
#14 2019-02 4
#15 2019-03 3
#16 2019-04 3
#17 2019-05 2
#18 2019-06 2
Upvotes: 1
Reputation: 1702
A solution with dplyr
.
Some example data to get a data frame of dates :
library(tidyverse)
library(lubridate)
df <- tibble(as.Date(c("2018-05-02", "2018-05-03", "2018-05-04", "2018-05-04", "2018-05-02", "2018-06-03", "2018-06-04", "2018-07-04", "2018-07-04"))) %>%
rename(Date = 1)
Add a column that specifies the numeric month:
df <- df %>%
mutate(Month = month(Date))
Create a data frame of numeric months (1-12) and give it a blank column for subscription count called 'subs'.
subs_by_month <- as.tibble(1:12) %>%
rename(Month = 1)
subs_by_month$subs <- NA
Loop through using tally
to count the number of observations:
for(i in 1:12){
subs_by_month$subs[[i]] <- unlist(
df %>%
tally(Month==i)
)
}
The resulting frame has the numeric month and number of subs per that month.
Upvotes: 0
Reputation: 886938
We can group use map
to do this. Create the 'm' as a Date
class object with paste
and then filter
the dataset based on the condition and apply the count
m <- ymd(paste0('2015-', sprintf('%02d', 1:12), '-15'))
map_df(m, ~
df %>%
filter(Start_Date <= .x & EndDate >= .x) %>%
count)
Upvotes: 1
Reputation: 861
ymd('2018-i-15')
does not a yield a date
object. You have passed ymd()
a string that has 2018
as year, i
as month and 15
as day. The i
in this case is fixed, and not replaced by the loop variable i
. This is why you get the error All formats failed to parse. No formats found.
, which is essentially telling you it cannot identify a date with i
as the month.
To keep i
variable, try ymd(paste0("2018-", i, "-15"))
.
for (i in m) {
z = rbind(z, data.frame(df %>% filter(StartDate <= ymd(paste0("2018-", i, "-15")) & EndDate >= ymd(paste0("2018-", i, "-15"))) %>% count()))
}
Upvotes: 1