How to structure a MySQL query to join with an exclusion

Question

In short; we are trying to return certain results from one table based on second level criteria of another table.

I have a number of source data tables, So:

Table DataA:

data_id | columns | stuff....
-----------------------------
   1    |  here   | etc.  
   2    |  here   | poop  
   3    |  here   | etc.

Table DataB:

data_id | columnz | various....
-----------------------------
   1    |  there  | you 
   2    |  there  | get  
   3    |  there  | the  
   4    |  there  | idea.

Table DataC:

data_id | column_s | others....
-----------------------------
   1    |  where   | you 
   2    |  where   | get  
   3    |  where   | the  
   4    |  where   | idea.

Table DataD: etc. There are more and more will be added ongoing

And a relational table of visits, where there are "visits" to some of these other data rows in these other tables above.

Each of the above tables holds very different sets of data.

The way this is currently structured is like this:

Visits Table:

visit_id | reference | ref_id | visit_data | columns | notes
-------------------------------------------------------------
    1    |   DataC   |   2    |  some data |  etc.   | so this is a reference 
         |           |        |            |         |   to a visit to row id
         |           |        |            |         |   2 on table DataC  
    2    |   DataC   |   3    |  some data |  etc.   |  ...  
    3    |   DataB   |   4    |  more data |  etc.   | so this is a reference 
         |           |        |            |         |   to a visit to row id
         |           |        |            |         |   4 on table DataB  
    4    |   DataA   |   1    |  more data |  etc.   | etc. etc.  
    5    |   DataA   |   2    |  more data |  etc.   | you get the idea

Now we currently list the visits by various user given criteria, such as visit date. however the user can also choose which tables (ie data types) they want to view, so a user has to tick a box to show they want data from DataA table, and DataC table but not DataB, for example.

The SQL we currently have works like this; the column list in the IN conditional is dynamically generated from user choices:

SELECT visit_id,columns, visit_data, notes
FROM visits 
WHERE visit_date < :maxDate AND visits.reference IN ('DataA','DataC')

The Issue:

Now, we need to go a step beyond this and list the visits by a sub-criteria of one of the "Data" tables, So for example, DataA table has a reference to something else, so now the client wants to list all visits to numerous reference types, and IF the type is DataA then to only count the visits if the data in that table fits a value.

For example:

List all visits to DataB and all visits to DataA where DataA.stuff = poop

The way we currently work this is a secondary SQL on the results of the first visit listing, exampled above. This works but is always returning the full table of DataA when we only want to return a subset of DataA but we can't be exclusive about it outside of DataA.

We can't use LEFT JOIN because that doesn't trim the results as needed, we can't use exclusionary joins (RIGHT / INNER) because that then removes anything from DataC or any other table,

We can't find a way to add queries to the WHERE because again, that would loose any data from any other table that is not DataA.

What we kind of need is a JOIN within an IF/CASE clause.

Pseudo SQL:

SELECT visit_id,columns, visit_data, notes
FROM visits 
IF(visits.reference = 'DataA')
    INNER JOIN DataA ON visits.ref_id = DataA.id AND DataA.stuff = 'poop'
ENDIF 
WHERE visit_date < 2020-12-06 AND visits.reference IN ('DataA','DataC')

All criteria in the WHERE clause are set by the user, none are static (This includes the DataA.stuff criteria too).

So with the above example the output would be:

visit_id | reference | ref_id | visit_data | columns | notes
-------------------------------------------------------------
    1    |   DataC   |   2    |  some data |  etc.   | 
    2    |   DataC   |   3    |  some data |  etc.   | 
    5    |   DataA   |   1    |  more data |  etc.   |

We can't use Union because the different Data tables contain lots of different details.

Questions:

There may be a very straightforward answer to this but I can't see it,

How can we approach trying to achieve this sort of partial exclusivity?
I suspect that our overarching architecture structure here could be improved (the system complexity has grown organically over a number of years). If so, what could be a better way of building this?

Bill Karwin · Accepted Answer

What we kind of need is a JOIN within an IF/CASE clause.

Well, you should know that's not possible in SQL.

Think of this analogy to function calls in a conventional programming language. You're essentially asking for something like:

What we need is a function call that calls a different function depending on the value you pass as a parameter.

As if you could do this:

call $somefunction(argument);

And which $somefunction you call would be determined by the function called, depending on the value of argument. This doesn't make any sense in any programming language.

It is similar in SQL — the tables and columns are fixed at the time the query is parsed. Rows of data are not read until the query is executed. Therefore one can't change the tables depending on the rows executed.

The simplest answer would be that you must run more than one query:

SELECT visit_id,columns, visit_data, notes
FROM visits 
INNER JOIN DataA ON visits.ref_id = DataA.id AND DataA.stuff = 'poop'
WHERE visit_date < 2020-12-06 AND visits.reference = 'DataA';

SELECT visit_id,columns, visit_data, notes
FROM visits 
WHERE visit_date < 2020-12-06 AND visits.reference = 'DataC';

Not every task must be done in one SQL query. If it's too complex or difficult to combine two tasks into one query, then leave them separate and write code in the client application to combine the results.

How to structure a MySQL query to join with an exclusion

The Issue:

Questions:

Answers (1)

Related Questions