Alex
Alex

Reputation: 13

Looping through Months and counting rows

I have a df with three columns: StartDate, EndDate and SubscriptionType.

df = data.frame(StartDate = as.Date(c('2018-05-01', '2018-06-01', '2018-01-01', '2018-07-01', '2018-03-01')), EndDate = as.Date(c('2019-04-30', '2019-05-31', '2018-12-31', '2019-06-30', '2019-02-28')), SubscriptionType = c('monthly', 'monthly', 'yearly', 'yearly', 'yearly'))

As example, by using this code:

df %>% filter(StartDate <= ymd('2018-5-15') & EndDate >= ymd('2018-5-15')) %>% count()

I get how many Subscriptions are active in the month of May.

I would like to print the results for every month in 2018 and store it in a data frame.

So far, I tried using this code:

z = NULL

m = c(01,02,03,04,05,06,07,08,09,10,11,12)

for (i in m) {z = rbind(z, data.frame(df %>% filter(StartDate <= ymd('2018-i-15') & EndDate >= ymd('2018-i-15')) %>% count()))}

but what I get are 24 warnings and a data frame z populated only with zeros.

Any help will be appreciated, thanks!

Upvotes: 1

Views: 1527

Answers (4)

vhcandido
vhcandido

Reputation: 354

Improving @akrun's answer, that provided the simplest and direct solution to the looping problem.

However, if you expect to get TRUE for any month overlapping the interval between two dates it'd be better to convert them to months instead of using some fixed day. You can test it for some row containing '2018-05-16' and '2018-06-14' (start and end dates, respectively), it wouldn't count for '2018-05' neither for '2018-06' with the original condition.

f <- '%Y-%m'  # it'll be used a few times

m <- seq(ymd(180101), ymd(191231), by='month')
m <- format(m, f)  # 'e.g. 2018-05'

# purrr::map_dfr() works like apply, it'll map each value from m into .x
# inside the expression and use dplyr::bind_rows() to concatenate the
# resulting rows (see documentation for details).
map_df(m, ~ df %>%
        filter(format(StartDate, f) <=  .x &
                format(EndDate, f)  >= .x) %>%
        group_by(month = .x) %>%
        count()
)

# For the data.frame you provided this is the result:
## A tibble: 18 x 2
## Groups:   month [18]
#   month       n
#   <chr>   <int>
# 1 2018-01     1
# 2 2018-02     1
# 3 2018-03     2
# 4 2018-04     2
# 5 2018-05     3
# 6 2018-06     4
# 7 2018-07     5
# 8 2018-08     5
# 9 2018-09     5
#10 2018-10     5
#11 2018-11     5
#12 2018-12     5
#13 2019-01     4
#14 2019-02     4
#15 2019-03     3
#16 2019-04     3
#17 2019-05     2
#18 2019-06     2  

Upvotes: 1

DeduciveR
DeduciveR

Reputation: 1702

A solution with dplyr.

Some example data to get a data frame of dates :

library(tidyverse)
library(lubridate)

df <- tibble(as.Date(c("2018-05-02", "2018-05-03", "2018-05-04", "2018-05-04", "2018-05-02", "2018-06-03", "2018-06-04", "2018-07-04", "2018-07-04"))) %>%
    rename(Date = 1)

Add a column that specifies the numeric month:

df <- df %>% 
mutate(Month = month(Date))

Create a data frame of numeric months (1-12) and give it a blank column for subscription count called 'subs'.

subs_by_month <- as.tibble(1:12) %>% 
    rename(Month = 1)
subs_by_month$subs <- NA

Loop through using tally to count the number of observations:

for(i in 1:12){
  subs_by_month$subs[[i]] <- unlist(
    df %>%
    tally(Month==i)
    )
}

The resulting frame has the numeric month and number of subs per that month.

Upvotes: 0

akrun
akrun

Reputation: 886938

We can group use map to do this. Create the 'm' as a Date class object with paste and then filter the dataset based on the condition and apply the count

m <- ymd(paste0('2015-', sprintf('%02d', 1:12), '-15'))
map_df(m, ~ 
       df %>%      
         filter(Start_Date <=  .x & EndDate  >= .x) %>% 
         count)

Upvotes: 1

phil_t
phil_t

Reputation: 861

ymd('2018-i-15') does not a yield a date object. You have passed ymd() a string that has 2018 as year, i as month and 15 as day. The i in this case is fixed, and not replaced by the loop variable i. This is why you get the error All formats failed to parse. No formats found., which is essentially telling you it cannot identify a date with i as the month.

To keep i variable, try ymd(paste0("2018-", i, "-15")).

for (i in m) {
     z = rbind(z, data.frame(df %>% filter(StartDate <= ymd(paste0("2018-", i, "-15")) & EndDate >= ymd(paste0("2018-", i, "-15"))) %>% count()))
}

Upvotes: 1

Related Questions