Alexis
Alexis

Reputation: 2294

Create dataframe with month start and end in R

I want to create a dataframe from a given start and end date:

start_date <- as.Date("2020-05-17")
end_date <- as.Date("2020-06-23")

For each row in this dataframe, I should have the start day and end day of the month, so the expected output is:

start       end         month   year
2020-05-17  2020-05-31  May     2020
2020-06-01  2020-06-23  June    2020

I have tried to create a sequence, but I'm stuck on what to do next:

day_seq <- seq(start_date, end_date, 1)

Please, a base R or tidyverse solution will be greatly appreciated.

Upvotes: 4

Views: 3700

Answers (4)

G. Grothendieck
G. Grothendieck

Reputation: 269491

1) yearmon Using start_date and end_date from the question create a yearmon sequence and then each of the desired columns is a simple one line computation. The stringAsFactors line can be omitted under R 4.0 onwards as that is the default there.

library(zoo)

ym <- seq(as.yearmon(start_date), as.yearmon(end_date), 1/12)

data.frame(start = pmax(start_date, as.Date(ym)),
           end = pmin(end_date, as.Date(ym, frac = 1)),
           month = month.name[cycle(ym)],
           year = as.integer(ym),
           stringsAsFactors = FALSE)

giving:

       start        end month year
1 2020-05-17 2020-05-31   May 2020
2 2020-06-01 2020-06-23  June 2020

2) Base R This follows similar logic and gives the same answer. We first define a function month1 which given a Date class vector x returns a Date vector the same length but for the first of the month.

month1 <- function(x) as.Date(cut(x, "month"))

months <- seq(month1(start_date), month1(end_date), "month")
data.frame(start = pmax(start_date, months),
           end = pmin(end_date, month1(months + 31) - 1),
           month = format(months, "%B"),
           year = as.numeric(format(months, "%Y")),
           stringsAsFactors = FALSE)

Upvotes: 3

Ben
Ben

Reputation: 30474

Here is one approach using intervals with lubridate. You would create a full interval between the 2 dates of interest, and then intersect with monthly ranges for each month (first to last day each month).

library(tidyverse)
library(lubridate)

start_date <- as.Date("2020-05-17")
end_date <- as.Date("2021-08-23")

full_int <- interval(start_date, end_date)

month_seq = seq(start_date, end_date, by = "month")
month_int = interval(floor_date(month_seq, "month"), ceiling_date(month_seq, "month") - days(1))

data.frame(interval = intersect(full_int, month_int)) %>%
  mutate(start = int_start(interval),
         end = int_end(interval),
         month = month.abb[month(start)],
         year = year(start)) %>%
  select(-interval)

Output

        start        end month year
1  2020-05-17 2020-05-31   May 2020
2  2020-06-01 2020-06-30   Jun 2020
3  2020-07-01 2020-07-31   Jul 2020
4  2020-08-01 2020-08-31   Aug 2020
5  2020-09-01 2020-09-30   Sep 2020
6  2020-10-01 2020-10-31   Oct 2020
7  2020-11-01 2020-11-30   Nov 2020
8  2020-12-01 2020-12-31   Dec 2020
9  2021-01-01 2021-01-31   Jan 2021
10 2021-02-01 2021-02-28   Feb 2021
11 2021-03-01 2021-03-31   Mar 2021
12 2021-04-01 2021-04-30   Apr 2021
13 2021-05-01 2021-05-31   May 2021
14 2021-06-01 2021-06-30   Jun 2021
15 2021-07-01 2021-07-31   Jul 2021
16 2021-08-01 2021-08-23   Aug 2021

Upvotes: 2

Claudio Secco
Claudio Secco

Reputation: 41

For the specific period in your question, you may use:

library(lubridate)

start_date <- as.Date("2020-05-17")
end_date <- as.Date("2020-06-23")

start <- c(start_date, floor_date(end_date, unit = 'months'))
end <- c(ceiling_date(start_date, unit = 'months'), end_date)
month <- c(as.character(month(start[1], label = TRUE)), 
           as.character(month(start[2], label = TRUE)))
year <- c(year(start[1]), year(start[2]))

data.frame(start, end, month, year, stringsAsFactors = FALSE)

Upvotes: 2

Wimpel
Wimpel

Reputation: 27732

A while ago that I used the tidyverse, but here is my go at things..

sample data

different sample data to tagckle some problems wher the year changes..

start_date <- as.Date("2020-05-17")
end_date <- as.Date("2021-06-23")

code

library( tidyverse )
library( lubridate )
#create a sequence of days from start to end
tibble( date = seq( start_date, end_date, by = "1 day" ) ) %>%
  mutate( month = lubridate::month( date ),
          year = lubridate::year( date ),
          end = as.Date( paste( year, month, lubridate::days_in_month(date), sep = "-" ) ) ) %>%
  #the end of the last group is now always larger than tghe maximum date... repair!
  mutate( end = if_else( end > max(date), max(date), end ) ) %>%
  group_by( year, month ) %>%
  summarise( start = min( date ), 
             end = max( end ) ) %>%
  select( start, end, month, year )

output

# # A tibble: 14 x 4
# # Groups:   year [2]
# start      end        month  year
# <date>     <date>     <dbl> <dbl>
# 1 2020-05-17 2020-05-31     5  2020
# 2 2020-06-01 2020-06-30     6  2020
# 3 2020-07-01 2020-07-31     7  2020
# 4 2020-08-01 2020-08-31     8  2020
# 5 2020-09-01 2020-09-30     9  2020
# 6 2020-10-01 2020-10-31    10  2020
# 7 2020-11-01 2020-11-30    11  2020
# 8 2020-12-01 2020-12-31    12  2020
# 9 2021-01-01 2021-01-31     1  2021
# 10 2021-02-01 2021-02-28     2  2021
# 11 2021-03-01 2021-03-31     3  2021
# 12 2021-04-01 2021-04-30     4  2021
# 13 2021-05-01 2021-05-31     5  2021
# 14 2021-06-01 2021-06-23     6  2021

Upvotes: 3

Related Questions