Joshua Zecha
Joshua Zecha

Reputation: 141

Matching values conditioned on overlapping Intervals and ID

I am having difficulties at manipulating the following data structures:

Attributes Data Frame:

ID  Begin_A      End_A        Interval                         Value
5   2017-03-01   2017-03-10   2017-03-01 UTC--2017-03-10 UTC   Cat1
10  2017-12-01   2017-12-02   2017-12-01 UTC--2017-12-02 UTC   Cat2
5   2017-03-01   2017-03-03   2017-03-01 UTC--2017-03-03 UTC   Cat3
10  2017-12-05   2017-12-10   2017-12-05 UTC--2017-12-10 UTC   Cat4

Bookings Data Frame:

ID  Begin_A      End_A        Interval 
5   2017-03-03   2017-03-05   2017-03-03 UTC--2017-03-05 UTC
6   2017-05-03   2017-05-05   2017-05-03 UTC--2017-05-05 UTC
8   2017-03-03   2017-03-05   2017-03-03 UTC--2017-03-05 UTC
10  2017-12-05   2017-12-06   2017-12-05 UTC--2017-12-06 UTC

Desired outcome frame (Bookings):

ID  Begin_A      End_A        Interval                        Attribute_value
5   2017-03-03   2017-03-05   2017-03-03 UTC--2017-03-05 UTC  Cat1,Cat3
6   2017-05-03   2017-05-05   2017-05-03 UTC--2017-05-05 UTC  NA
8   2017-03-03   2017-03-05   2017-03-03 UTC--2017-03-05 UTC  NA
10  2017-12-05   2017-12-06   2017-12-05 UTC--2017-12-06 UTC  Cat4

Code for the data frames:

library(lubridate)
# Attributes data frame:
date1 <- as.Date(c('2017-3-1','2017-12-1','2017-3-1','2017-12-5'))
date2 <- as.Date(c('2017-3-10','2017-12-2','2017-3-3','2017-12-10'))
attributes <- data.frame(matrix(NA,nrow=4, ncol = 5)) 
names(attributes) <- c("ID","Begin_A", "End_A", "Interval", "Value")
attributes$ID <- as.numeric(c(5,10,5,10))
attributes$Begin_A <-date1
attributes$End_A <-date2
attributes$Interval <-attributes$Begin_A %--% attributes$End_A
attributes$Value<- as.character(c("Cat1","Cat2","Cat3","Cat4"))

### Bookings data frame:

date1 <- as.Date(c('2017-3-3','2017-5-3','2017-3-3','2017-12-5'))
date2 <- as.Date(c('2017-3-5','2017-5-5','2017-3-5','2017-12-6'))
bookings <- data.frame(matrix(NA,nrow=4, ncol = 4)) 
names(bookings) <- c("ID","Begin_A", "End_A", "Interval")
bookings$ID <- as.numeric(c(5,6,8,10))
bookings$Begin_A <-date1
bookings$End_A <-date2
bookings$Interval <-bookings$Begin_A %--% bookings$End_A

The procedure to arrive at my outcome frame should be the following: Take the ID from bookings, filter all rows of the attributes data frame where attributes ID matches the booking ID. Check which of the rows with matching attribute ID also have overlapping time intervals (int_overlaps from lubridate). Then take the respective value from the Value column and print each of them in the Attribute_value column.

Upvotes: 0

Views: 104

Answers (1)

www
www

Reputation: 39154

A solution from tidyverse.

library(tidyverse)

attributes2 <- attributes %>%
  select(-Interval) %>%
  gather(Type, Date, ends_with("_A")) %>%
  select(-Type) %>%
  group_by(Value) %>%
  complete(Date = full_seq(Date, period = 1), ID) %>%
  ungroup()

bookings2 <- bookings %>%
  select(-Interval) %>%
  gather(Type, Date, ends_with("_A")) %>%
  select(-Type) %>%
  group_by(ID) %>%
  complete(Date = full_seq(Date, period = 1)) %>%
  ungroup()

bookings3 <- bookings2 %>%
  left_join(attributes2, by = c("ID", "Date")) %>%
  group_by(ID) %>%
  summarise(Attribute_value = toString(sort(unique(Value)))) %>%
  mutate(Attribute_value = ifelse(Attribute_value %in% "", NA, Attribute_value))

bookings4 <- bookings %>% left_join(bookings3, by = "ID")
bookings4
  ID    Begin_A      End_A                       Interval Attribute_value
1  5 2017-03-03 2017-03-05 2017-03-03 UTC--2017-03-05 UTC      Cat1, Cat3
2  6 2017-05-03 2017-05-05 2017-05-03 UTC--2017-05-05 UTC            <NA>
3  8 2017-03-03 2017-03-05 2017-03-03 UTC--2017-03-05 UTC            <NA>
4 10 2017-12-05 2017-12-06 2017-12-05 UTC--2017-12-06 UTC            Cat4

Upvotes: 1

Related Questions