mkpcr
mkpcr

Reputation: 511

PostgreSQL - Filtering join using date ranges by group

I have two tables:

Table 1: A table containing a listing of event occurrences with unique IDs per unique event (event_id), the date of the event (date_event), and each person's gender (gender):

person| event_id| date_event | gender|
----- |---------|------------|-------|
a     |  86i    | 2012-01-25 |   m   |
a     |  87i    | 2012-05-30 |   m   |
a     |  88i    | 2012-09-20 |   m   |
a     |  89i    | 2012-12-20 |   m   |
b     |  15i    | 2015-04-06 |   f   |
b     |  16i    | 2016-07-06 |   f   |
b     |  17i    | 2016-04-30 |   f   |
b     |  18i    | 2016-11-28 |   f   |
----- |---------|------------|-------|

Table 2: A table of dates listing the start dates (date_start) and end dates (date_end), representing ranges of time in which each person was enrolled in a program:

person| date_start | date_end   |
----- |------------|------------|
a     | 2012-02-05 | 2012-03-30 |
a     | 2012-06-26 | 2012-08-28 |
a     | 2012-09-15 | 2012-12-31 |
b     | 2015-01-24 | 2015-03-30 |
b     | 2016-07-01 | 2016-10-01 |
b     | 2016-11-25 | 2016-12-30 |
----- |------------|------------|

I would like to be able to filter Table 1 to only contain records of events that occurred when the person was enrolled in a program, which would be specified as date_event being between date_start and date_end. My desired resulting table would look like:

Desired result:

person| event_id| date_event | gender|
----- |---------|------------|-------|
a     |  88i    | 2012-09-20 |   m   |
a     |  89i    | 2012-12-20 |   m   |
b     |  16i    | 2016-07-06 |   f   |
b     |  18i    | 2016-11-28 |   f   |
----- |---------|------------|-------|

I have tried several variations of the following to no avail:

CREATE TABLE temptable AS
SELECT * 
FROM table1 t1
LEFT JOIN table2 t2 
USING(person);

CREATE TABLE desiredresult AS
SELECT * 
FROM temptable 
WHERE date_event BETWEEN date_start AND date_end
GROUP BY person;

I will admit that I am definitely a SQL novice and am used to using R where I can achieve my desired result using the below code. However, the dataset I'm using is gigantic so I need to figure out how to perform these operations on a SQL server.

library(dplyr)
library(lubridate)

desiredresult <- table1 %>%
    left_join(table2) %>%
    group_by(person) %>%
    mutate(keep_record = date_event %within% interval(start_date, end_date)) %>%
    filter(keep_record == TRUE)

Thanks in advance.

Upvotes: 0

Views: 428

Answers (1)

Belayer
Belayer

Reputation: 14861

There is no reason for creating a temp table, and spend the processing time doing so. Just select directly from the JOIN. (See Demd)

select t1.* 
  from table1 t1
  join table2 t2 
    on (     t1.person = t2.person
         and t1.date_event between t2.date_start and t2.date_end 
       ) ;   

Upvotes: 1

Related Questions