Reputation: 1397
In our project we have a table events
each of which can have multiple dates
. This means each row in dates
has a event_id
and a position
where position
(starting at 0) is unique within the same event_id
. dates
also (obviously) has a date
column.
Now I want to select all events
which have a specific list of date
child values (ordered by position
). Is this possible within a single SQL query?
Example Tables
events: (id, name, event_type, description, comment, user_id)
dates: (id, event_id, position, date_at, latitude, longitude, location)
id
, position and lat/long columns are integer, date_at
is a DATE, the rest are strings (character varying) which don't really matter here. For example:
INSERT INTO events (id, name) VALUES (1, 'Birthday Party');
INSERT INTO dates (id, event_id, position, date_at) VALUES ( 1, 1, 0, '2014-06-30');
INSERT INTO dates (id, event_id, position, date_at) VALUES ( 2, 1, 1, '2015-06-30');
INSERT INTO dates (id, event_id, position, date_at) VALUES ( 3, 1, 2, '2016-06-30');
INSERT INTO dates (id, event_id, position, date_at) VALUES ( 4, 1, 3, '2017-06-30');
INSERT INTO dates (id, event_id, position, date_at) VALUES ( 5, 1, 4, '2018-06-30');
INSERT INTO events (id, name) VALUES (2, 'S.O. Birthday Party');
INSERT INTO dates (id, event_id, position, date_at) VALUES ( 6, 2, 0, '2014-02-11');
INSERT INTO dates (id, event_id, position, date_at) VALUES ( 7, 2, 1, '2015-02-11');
INSERT INTO dates (id, event_id, position, date_at) VALUES ( 8, 2, 2, '2016-02-11');
INSERT INTO dates (id, event_id, position, date_at) VALUES ( 9, 2, 3, '2017-02-11');
INSERT INTO dates (id, event_id, position, date_at) VALUES (10, 2, 4, '2018-02-11');
I need a query that returns Event #1
when I feed it this array of dates: [2014-06-30, 2015-06-30, 2016-06-30, 2017-06-30, 2018-06-30]
. The whole query string array will be automatically generated, so it can have any format and can be hardcoded in this example (we use Ruby on Rails). The list of possible events will have other restrictions (user_id, event_type, etc) so it won't be a long list (perhaps 10..100 events), so it need not be an indexable query.
It should not return Event #1 when fed these dates in another order or other dates or a subset of these dates. It should accept any number of dates as input, including an empty set.
Upvotes: 0
Views: 1241
Reputation: 86765
I'm going to assume for the moment that you have created an additional table in which to store you list of specific dates
SELECT
event.*
FROM
events
INNER JOIN
(
SELECT
d.event_id
FROM
dates AS d
INNER JOIN
list_of_dates AS l
ON l.date = d.date
GROUP BY
d.event_id
HAVING
COUNT(*) = (SELECT COUNT(*) FROM list_of_dates)
AND MAX(d.position) = MIN(d.position) + COUNT(*) - 1
)
AS list
ON list.event_id = events.id
This joins the dates
table to the list of dates that you have.
It then groups the results by event_id
.
The HAVING
clause then only allows through an event_id
if the returned number of dates matches the number in the list. (If I join 5 dates, I want only the event_id
s that include all 5 of those dates).
The HAVING
clause then ensure that all of those dates are sequential. If I have a list of 5 dates, the MAX(position)
should be 4 more than the MIN(position)
I'm not actually sure if that last criteria is what you wanted, but hopefully it will give you a structure to work with for the criteria that you do need.
Also, note that this will require a full table scan. This type of multi-row search is always slow in SQL. (Academically, it's the same as searching an Entity-Attribute-Value table.)
EDIT: In response to question edit.
If hard-coding the dates into the query, you could do the following, but you must be crucially aware of the possibility of SQL Injection Attacks if any part of the query is controlled by the user.
SELECT
event.*
FROM
events
INNER JOIN
(
SELECT
event_id
FROM
dates AS d
WHERE
date IN ('2014-06-30', '2015-06-30', '2016-06-30', '2017-06-30', '2018-06-30')
GROUP BY
event_id
HAVING
COUNT(*) = 5
AND MAX(d.position) = MIN(d.position) + 5 - 1
)
AS list
ON list.event_id = events.id
You would hard code both the list of dates and the two occurrences of the 5
.
With regards to you requirement of so it need not be an indexable query
, you will find that this is in fact impossible, to a degree. I'll try to explain why...
There is no mechanism to interrogate multiple rows together and at the same time. You do get simultaneously, but also individually. What I mean by that is that you can check is the date on this row in my list of dates
? What you can't do is to say are all of these dates in my list
.
The query above checks each row one at a time, is this date in my list, then calculates how many matched and were they all in order in the HAVING clause. If any event has any one of those dates you still need to process that date, then process the HAVING clause and only then realise that not all of the dates were present.
Each individual is this date in my list can be indexable
, but the final HAVING clause will be a full scan of those matches. The result is that this does not scale particularly well at all. And there is very little that can be done about that.
Depending on the statistical structure of your data, you may be able to make minor, and quite esoteric, optimisations. For example...
INSERT INTO
dates (id, event_id, first_position, date0, date1, date2, date3, date4, date5)
VALUES
( 1, 1, 0, '2014-06-30', '2015-06-30', '2016-06-30', '2017-06-30', '2018-06-30', NULL ),
( 1, 1, 1, '2015-06-30', '2016-06-30', '2017-06-30', '2018-06-30', NULL, NULL ),
( 1, 1, 2, '2016-06-30', '2017-06-30', '2018-06-30', NULL, NULL, NULL ),
( 1, 1, 3, '2017-06-30', '2018-06-30', NULL, NULL, NULL, NULL ),
( 1, 1, 4, '2018-06-30', NULL, NULL, NULL, NULL, NULL )
;
SELECT
events.*
FROM
events
INNER JOIN
dates
ON dates.event_id = events.id
WHERE
dates.date0 = '2015-06-30'
AND dates.date1 = '2016-06-30'
AND dates.date2 = '2017-06-30'
;
Now, you could have an index on (date0, event_id)
and quickly find all the events that have '2015-06-30'
as one of the dates. You would also only get back one row from the dates
table, and can quickly check if all of the other dates are also present.
It's dirty though. If an event could have 100 associated dates, you'd need 100 date fields. That also makes in inflexible (you might provision 128 dates, then in the future find you need 129). It also makes writing queries against it a bit messy (you're not passing parameters, you're writing a WHERE clause).
The index seek on date0
is still going to return rows that are later discarded by the remained of the where clause, but it's very likely that you'll get fewer of these cases than by using the HAVING clause method; so it should be faster. Unless you have so many columns that the extra i/o of reading so many redundant columns counters your savings. Over-all I'd still expect this to be measurably faster, and yet I'd still be very apprehensive of trying it.
Unfortunately the concept of searching multiple rows within grouped rows is always going to be messy and/or costly to execute.
Upvotes: 1