Reputation: 511
I have two tables:
Table 1: A table containing a listing of event occurrences with unique IDs per unique event (event_id
), the date of the event (date_event
), and each person's gender (gender
):
person| event_id| date_event | gender|
----- |---------|------------|-------|
a | 86i | 2012-01-25 | m |
a | 87i | 2012-05-30 | m |
a | 88i | 2012-09-20 | m |
a | 89i | 2012-12-20 | m |
b | 15i | 2015-04-06 | f |
b | 16i | 2016-07-06 | f |
b | 17i | 2016-04-30 | f |
b | 18i | 2016-11-28 | f |
----- |---------|------------|-------|
Table 2: A table of dates listing the start dates (date_start
) and end dates (date_end
), representing ranges of time in which each person was enrolled in a program:
person| date_start | date_end |
----- |------------|------------|
a | 2012-02-05 | 2012-03-30 |
a | 2012-06-26 | 2012-08-28 |
a | 2012-09-15 | 2012-12-31 |
b | 2015-01-24 | 2015-03-30 |
b | 2016-07-01 | 2016-10-01 |
b | 2016-11-25 | 2016-12-30 |
----- |------------|------------|
I would like to be able to filter Table 1 to only contain records of events that occurred when the person was enrolled in a program, which would be specified as date_event
being between date_start
and date_end
. My desired resulting table would look like:
Desired result:
person| event_id| date_event | gender|
----- |---------|------------|-------|
a | 88i | 2012-09-20 | m |
a | 89i | 2012-12-20 | m |
b | 16i | 2016-07-06 | f |
b | 18i | 2016-11-28 | f |
----- |---------|------------|-------|
I have tried several variations of the following to no avail:
CREATE TABLE temptable AS
SELECT *
FROM table1 t1
LEFT JOIN table2 t2
USING(person);
CREATE TABLE desiredresult AS
SELECT *
FROM temptable
WHERE date_event BETWEEN date_start AND date_end
GROUP BY person;
I will admit that I am definitely a SQL novice and am used to using R where I can achieve my desired result using the below code. However, the dataset I'm using is gigantic so I need to figure out how to perform these operations on a SQL server.
library(dplyr)
library(lubridate)
desiredresult <- table1 %>%
left_join(table2) %>%
group_by(person) %>%
mutate(keep_record = date_event %within% interval(start_date, end_date)) %>%
filter(keep_record == TRUE)
Thanks in advance.
Upvotes: 0
Views: 428