delmalki
delmalki

Reputation: 1364

Multiple SubSelect Inner Joins Mysql

I am trying to create multiple Z Scores [(value-AVG)/STD] within one query and filter them.

So far, I came up with this; before you look at it, however, theres some french words, "quartier" is like a district/borough and "prix" is a price.

SELECT ((t1.prix-A1.HMEAN)*100/A1.STD_SAMPLE) AS Z,
q.myId AS q_myId
FROM immobilier_ad_blank AS t1 
LEFT JOIN Adresse AS a ON t1.adresse_id=a.id
LEFT JOIN Quartier AS q ON a.quartier_id=q.id
CROSS JOIN (
    SELECT AVG(t2.prix) AS HMEAN, 
    STD(t2.prix) AS STD_SAMPLE,
    q.myId AS quartier_myId
    FROM immobilier_ad_blank AS t2
    LEFT JOIN Adresse AS a ON t2.adresse_id=a.id
    LEFT JOIN Quartier AS q ON a.quartier_id=q.id
    GROUP BY quartier_myId
) AS A1 ON q.myId = A1.quartier_myId

It seems to work, however I'm not sure if Im doing this right, I only added one filtering condition but I will add up to 6 conditions and so far this method seems quite cumbersome for what it does.

Ultimately it would look like this

SELECT 
((t1.prix-A1.HMEAN)*100/A1.STD_SAMPLE) AS Z1,
((t1.prix-A2.HMEAN)*100/A1.STD_SAMPLE) AS Z2,
((t1.prix-AN.HMEAN)*100/A1.STD_SAMPLE) AS ZN,
All Relevant selects
FROM immobilier_ad_blank AS t1 
All Relevant LEFT Joins
CROSS JOIN (
    SELECT AVG(t2.prix) AS HMEAN, 
    STD(t2.prix) AS STD_SAMPLE,
    q.myId AS quartier_myId
    FROM immobilier_ad_blank AS t2
    LEFT JOIN Adresse AS a ON t2.adresse_id=a.id
    LEFT JOIN Quartier AS q ON a.quartier_id=q.id
    GROUP BY quartier_myId
) AS A1 ON q.myId = A1.quartier_myId
CROSS JOIN (
    SELECT AVG(t2.prix) AS HMEAN, 
    STD(t2.prix) AS STD_SAMPLE,
    q.myId AS quartier_myId
    FROM immobilier_ad_blank AS t3
    Other Joins
    Group By Other conditions
) AS A2 ON OtherConditions
CROSS JOIN (
    SELECT AVG(tN.prix) AS HMEAN, 
    STD(tN.prix) AS STD_SAMPLE,
    q.myId AS quartier_myId
    FROM immobilier_ad_blank AS tN
    Other Joins
    Group By Other conditions
) AS AN ON OtherConditions

This is in mysql, according to manual cross join = inner join = join. However from what I understood cross join is table X table, but in my case since it returns aggregates, I suppose its not t^2 but t*(number of aggregates) right ?

AM I right on the right track ? Optimization Wise ? Did I put my filtering at the right place ?

I basically wanna select all rows, find their averages function of their join values, calculate their Z Scores, and add them up within less than 5 sec and with around 1M data.

EDIT 1:

I'll simplify my question,

SELECT 
((t1.prix-A1.HMEAN)*100/A1.STD_SAMPLE) AS Z,
((t1.prix-A2.HMEAN)*100/A2.STD_SAMPLE) AS Z2,
q.myId AS q_myId,
c.myId AS c_myId,
s.myId as size_myId
FROM immobilier_ad_blank AS t1 
LEFT JOIN Size AS s ON t1.size_id=s.id
LEFT JOIN Adresse AS a ON t1.adresse_id=a.id
LEFT JOIN City AS c ON a.city_id=c.id
LEFT JOIN Quartier AS q ON a.quartier_id=q.id
CROSS JOIN (
    SELECT AVG(t2.prix) AS HMEAN, 
    STD(t2.prix) AS STD_SAMPLE,
    q.myId AS quartier_myId
    s.myId as size_myId
    FROM immobilier_ad_blank AS t2
    LEFT JOIN Size AS s ON t2.size_id=s.id
    LEFT JOIN Adresse AS a ON t2.adresse_id=a.id
    LEFT JOIN Quartier AS q ON a.quartier_id=q.id
    GROUP BY quartier_myId, size_myId
) AS A1 ON q.myId = A1.quartier_myId
AS A2 on s.myId=A2.size_myId #<--------- Is this line possible ?

EDIT 3:

I ended up using temporary tables and duplicating them because views in my case were way slower than temp tables even though the underlying data was correctly indexed.

CREATE TEMPORARY TABLE IF NOT EXISTS A1 AS (
SELECT AVG(t2.prix) AS HMEAN, 
    STD(t2.prix) AS STD_SAMPLE,
    s.myId AS size_myId,
    q.myId AS quartier_myId
    FROM immobilier_ad_blank AS t2
    LEFT JOIN Size AS s ON t2.size_id=s.id
    LEFT JOIN Adresse AS a ON t2.adresse_id=a.id
    LEFT JOIN Quartier AS q ON a.quartier_id=q.id
    GROUP BY quartier_myId,size_myId);

CREATE TEMPORARY TABLE A2 LIKE A1; 
INSERT A2 SELECT * FROM A1;

SELECT 
((c.prix-A1.HMEAN)*100/A1.STD_SAMPLE) AS Z1,
((c.prix-A2.HMEAN)*100/A2.STD_SAMPLE) AS Z2,
q.myId AS quartier_myId,
s.myId AS size_myId
FROM immobilier_ad_blank AS c
LEFT JOIN Size AS s ON c.size_id=s.id
LEFT JOIN Adresse AS ad ON c.adresse_id=ad.id
LEFT JOIN Quartier AS q ON ad.quartier_id=q.id
JOIN A1 on A1.quartier_myId = q.myId
JOIN A2 AS A2 on A2.size_myId = s.myId

Basically, joining the same virtual table (subquery) === Views || Temp Tables And to be honest, I might end up creating a permanent table and update it once a day....

Upvotes: 2

Views: 73

Answers (1)

O. Jones
O. Jones

Reputation: 108686

CROSS JOIN ... ON condition makes the cross join into an ordinary INNER JOIN or, for brevity, just JOIN. So don't worry about the combinatorial explosion. A correctly written ON condition prevents it.In short, forget about CROSS JOIN. Just write JOIN.

You can write your JOIN as something like this

 FROM table_a AS a
 JOIN table_b AS b ON a.id = b.id

Or, you can express either table_a or table_b, or both, as subqueries. For example, you can write this query.

 FROM table_A AS a
 JOIN (
         SELECT MAX(id) id, district
           FROM table_b
          GROUP BY district
      ) AS b on a.id = b.id

In other words, you can use a physical table (table_b) or a virtual table (a subquery) interchangeably.

In your case, your subquery is a four-column table: (HMEAN, STD_SAMPLE, quartier_myId, and size_myId) .

How will you join that table to the rest of the tables in your query?

You have the code

  JOIN (subquery) AS A1 ON q.myId = A1.quartier_myId 

But, to use the fourth column of your subquery (size_myId) you need to use it also for the JOIN. To do this, put an AND in your ON clause. Do something like this:

  JOIN (subquery) AS A1   ON q.myId = A1.quartier_myId 
                         AND s.myId = A.size_myId

If the meaning of s.myId is correct, that should give you a useful result.

Compound ON clauses are very useful. You can do such things as this:

  LEFT JOIN (subquery) AS A1   ON q.myId = A1.quartier_myId 
                              AND s.myId = A.size_myId
                              AND A.HMEAN > 7.5

to filter your results.

Upvotes: 2

Related Questions