ChrisL
ChrisL

Reputation: 73

Use subquery in select statement in Tableau

I have three tables with the following structures: 1) patients (id, name) 2) illnesses (id, name) 3) patient_illnesses (patient_id, illness_id)

In Tableau, I'd like to query for the illnesses that patients have for a select group of patients (for instance, those that do not have cancer).

IN SQL, this would look like.

SELECT * FROM patients 
INNER JOIN patient_illnesses ON patients.id = patient_illnesses.patient_id 
INNER JOIN illnesses ON illnesses.id = patient_illnesses.illness_id 
WHERE NOT patients.id NOT IN (
SELECT patients.id FROM patients
INNER JOIN patient_illnesses ON patients.id = patient_illnesses.patient_id 
WHERE patient_illnesses.illness_id = 4 
GROUP BY patients.id
)

In this example, illness_id 4 refers to a specific illness, but I would like to be able to multiselect filter in Tableau on this.

I'm able to accomplish this in Tableau with custom SQL, but I can't get the filter to work when I use custom SQL.

Any thoughts?

Upvotes: 1

Views: 4118

Answers (1)

Alex Blakemore
Alex Blakemore

Reputation: 11896

Avoid custom SQL, except in the increasingly rare case where you need to access some specialized database specific function. Besides not being very flexible, it prevents Tableau from performing some query optimizations.

There are multiple solutions. For this one, I created a connection that joined all three tables together to get one row per patient/disease. Then I hid the id fields in Tableau and renamed the name fields to be descriptive of what tables they came from.

enter image description here

Then I defined a set of patients defined by a condition -- in this case, the condition is that the patient must not have some specific disease. The disease in question is provided by a parameter. Sets are pretty useful for this particular case, but you could use an LOD calculation instead or a pair of data sources with an action filter or other approaches.

The condition takes a little care to understand, but it is simply a boolean expression that takes a set of records (that have the same patient id) and returns true or false to indicate whether that patient should belong to the set. It helps to know that the min and max functions treat True values as greater than false values. So MIN() is True only if is True for every single data row in the batch being considered.

enter image description here

Finally, here it all is on a dashboard, along with a cross tab view of the data for error checking.

enter image description here

enter image description here

Upvotes: 2

Related Questions