John Huang
John Huang

Reputation: 845

Creating intervals

I have a data set that I would like to split into 10-day intervals. The code that I included below does that, but for the last week or so there are days that (e.g., the 31st or 30th of a month) that remain end up by itself.

I would like to either remove the intervals that create this or include them in the previous intervals.

For example: If I separate the month of January by 10-day intervals, it would put the first 10 days in a element of a list, the second 10 days into another element and the third 10 days into another one. It would then put January 31st into a element of list by itself.

My desired output would be to either remove these elements from the list or more preferably include them in the third 10-day interval. Can that be done? If so, what would be the best way to do so?

library(lubridate)
library(tidyverse)
date <- rep_len(seq(dmy("26-12-2010"), dmy("20-12-2013"), by = "days"), 500)
ID <- rep(seq(1, 5), 100)

df <- data.frame(date = date,
                 x = runif(length(date), min = 60000, max = 80000),
                 y = runif(length(date), min = 800000, max = 900000),
                 ID)

int <- df %>%
  arrange(ID) %>% 
  mutate(new = ceiling_date(date, '10 day')) %>% 
  # mutate(cut = data.table::rleid(cut(new, breaks = "10 day"))) %>% 
  group_by(new) %>%
  group_split()

Upvotes: 1

Views: 182

Answers (3)

josep maria porr&#224;
josep maria porr&#224;

Reputation: 1388

Alternative solution

library(lubridate)
library(tidyverse)

dt <- rep_len(seq(dmy("26-12-2010"), dmy("20-12-2013"), by = "days"), 500)
ID <- rep(seq(1, 5), 100)

df <- data.frame(dt = dt,
  x = runif(length(dt), min = 60000, max = 80000),
  y = runif(length(dt), min = 800000, max = 900000),
  ID)
  1. Include extra days (31st) into the last third
int_df <- df %>%
  # arrange(ID) %>%
  mutate(day_date = day(dt),
    day_new = case_when(
      day_date <= 10  ~ 1,
      day_date <= 20  ~ 11,
      TRUE            ~ 21
    ),
    new = ymd(paste(year(dt), month(dt), day_new, sep = "-"))) %>% 
  select(-c(day_date, day_new)) %>% 
  group_by(new) %>%
  group_split()

int_df[[1]]
#> # A tibble: 6 x 5
#>   dt              x       y    ID new       
#>   <date>      <dbl>   <dbl> <int> <date>    
#> 1 2010-12-26 62395. 837491.     1 2010-12-21
#> 2 2010-12-27 66236. 836481.     2 2010-12-21
#> 3 2010-12-28 79918. 818399.     3 2010-12-21
#> 4 2010-12-29 67613. 807213.     4 2010-12-21
#> 5 2010-12-30 72980. 899380.     5 2010-12-21
#> 6 2010-12-31 61004. 876191.     1 2010-12-21
  1. Exclude extra days (31st)
int_df <- df %>%
  # arrange(ID) %>%
  mutate(day_date = day(dt),
    day_new = case_when(
      day_date <= 10  ~ 1,
      day_date <= 20  ~ 11,
      day_date <= 30  ~ 21,
      TRUE            ~ 31
    ),
    new = ymd(paste(year(dt), month(dt), day_new, sep = "-"))) %>% 
  filter(day_date != 31) %>% 
  select(-c(day_date, day_new)) %>%
  group_by(new) %>%
  group_split()

int_df[[1]]
#> # A tibble: 5 x 5
#>   dt              x       y    ID new       
#>   <date>      <dbl>   <dbl> <int> <date>    
#> 1 2010-12-26 62395. 837491.     1 2010-12-21
#> 2 2010-12-27 66236. 836481.     2 2010-12-21
#> 3 2010-12-28 79918. 818399.     3 2010-12-21
#> 4 2010-12-29 67613. 807213.     4 2010-12-21
#> 5 2010-12-30 72980. 899380.     5 2010-12-21

Created on 2021-07-03 by the reprex package (v2.0.0)

Upvotes: 0

Uwe
Uwe

Reputation: 42544

Here is a solution which splits the months by 10-day intervals but corrects new to assign day 31 of a month to the last period. So,

  • days 1 to 10 belong to the first third of a month,
  • days 11 to 20 to the second third, and
  • days 21 to 31 to the third third.
int <- df %>%
  # arrange(ID) %>%   # skipped for readability of result
  mutate(new = floor_date(date, '10 day')) %>%
  mutate(new = if_else(day(new) == 31, new - days(10), new)) %>% 
  group_by(new) %>%
  group_split()

int[[1]]
# A tibble: 6 x 5
  date            x       y    ID new       
  <date>      <dbl>   <dbl> <int> <date>    
1 2010-12-26 71469. 819084.     1 2010-12-21
2 2010-12-27 69417. 893227.     2 2010-12-21
3 2010-12-28 70865. 831341.     3 2010-12-21
4 2010-12-29 68322. 812423.     4 2010-12-21
5 2010-12-30 65643. 837395.     5 2010-12-21
6 2010-12-31 63638. 892200.     1 2010-12-21

Now, 2010-12-31 was assigned to the third third of December.

Note that new indicates the start of the interval by calling floor_date() instead of ceiling_date(). This is due to avoid potential problems with day arithmetic across month boundaries and to clarify to which month the interval belongs to. For instance, for the last day of February, ceiling_date(ymd('2011-02-28'), '10 day') returns "2011-03-03" which is a date in March.

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388797

If there is a single row in a group give it the previous new value. Try this -

library(dplyr)
library(lubridate)

df %>%
  arrange(ID, date) %>% 
  mutate(new = ceiling_date(date, '10 day')) %>%
  add_count(new) %>%
  mutate(new = if_else(n == 1, lag(new), new)) %>%
  select(-n) %>%
  group_split(new)

Above would only work to combine groups that has 1 observation in a group. If we want to combine more than 1 day use the below code which counts numbers of days in a group. It combines the group if number of day is less than n number of days.

n <- 2

df %>%
  arrange(ID, date) %>% 
  mutate(new = ceiling_date(date, '10 day'), 
         ID = match(new, unique(new))) -> tmp
tmp %>%
  group_by(new, ID) %>%
  summarise(count_unique = n_distinct(date)) %>%
  ungroup %>%
  mutate(new = if_else(count_unique < n, lag(new), new)) %>%
  inner_join(tmp, by = 'ID') %>%
  select(new = new.x, date, x, y) %>%
  group_split(new)

Upvotes: 2

Related Questions